Technical SEO Beginner's Guide | byPatrick Stox

Since this is a beginner's guide, let's start with the basics.

Technical SEO is optimizing your website to help search engines like Google find, crawl, understand, and index your web pages. The goal is to get search engines to find and improve rankings.

It depends. The basics are not difficult to grasp, but technical SEO can be complex and hard to understand. I will make things as simple as possible through this guide.

In this chapter, we will introduce how to ensure that search engines can effectively crawl your content.

Crawlers scrape content from pages and use the links on these pages to find more pages. This allows them to discover more content on the internet. There are some mechanisms in this process that need to be discussed.

Crawlers must start from somewhere. Usually, they create a list of all the URLs they find through the pages. Another mechanism is to find more URLs through a sitemap created by users or various systems with a list of pages.

All URLs that need to be crawled or recrawled will be prioritized and added to the crawl queue. This is essentially an ordered list of URLs that Google wants to crawl.

The mechanism for capturing page content.

These are standardized processing mechanisms that render pages, just like a browser loads a page, and process the page to obtain more URLs to crawl, which we will discuss later.

Rendering is like a browser loading a page, loading JavaScript and CSS files. This is done so that Google can see the content that most users will see.

Used to store the pages displayed to users by Google.

There are several ways to control the content that can be crawled on your website.

The Robots.txt file tells search engines which pages they can and cannot access.

It should be noted that if the links point to these pages, even if Google cannot access the page, it may still index them. This can be confusing, but if you want to prevent the page from being indexed, please refer to this guide and flowchart.

You can use a crawl-delay directive in robots.txt, which many crawling tools support, allowing you to set the frequency at which they crawl pages. Unfortunately, Google does not support this. For Google, you need to change the crawl rate in Google Search Console as described here.

If you want certain users to access this page, but search engines cannot access this page, then you may want one of the following three situations:

Some login pages;
HTTP Authentication (places that require a password to access);
IP whitelist (only allows specific IP addresses to access the page)

This type of setup is best suited for internal networks, member-only content, testing, or sites in development. It allows a group of users to access the page, but search engines will not be able to access them and will not index these pages.

特别是对于 Google，查看他们正在抓取的内容的最简单方法是使用 Google Search Console Crawl Statistics Report，该报告为你提供有关抓取你网站的更多信息。

If you want to view all crawling activities on the website, you need to access the server logs and use tools to better analyze the data. If your host has a control panel like cPanel, you should be able to access the raw logs through some tools such as Awstats and Webalizer.

Every website has a different crawl budget, which is a combination of how often Google crawls the site and the number of pages your site allows to be crawled. More popular pages and pages that change frequently will be crawled more often, while pages that appear less popular or have fewer links will have a lower crawl frequency.

If the crawling tools are under pressure while crawling the website, they usually slow down or even stop crawling until conditions improve.

After the pages are crawled, they are rendered and then sent to the index. The index is a list that stores search results.

Let's talk about the index.

In this chapter, we will discuss how to ensure your pages are indexed and check how they are indexed.

The crawler tag is an HTML snippet that tells search engines how to crawl or index a page. It is placed in the <head> section of the webpage, as follows:

<meta name="robots" content="noindex" />

When there are multiple versions of the same page, Google will choose one stored in its index. This process is called normalization, and the selected canonical URL will be the one displayed by Google in search results. They use many different signals to choose the canonical URL, including:

The easiest way to see how Google indexes a page is to use the URL Inspection Tool in Google Search Console. It will show what the canonical URL is that Google has chosen.

One of the most difficult things for SEO is determining priorities. There are many best practices, but some changes will have a greater impact on your rankings and traffic than others. Here are some factors I recommend prioritizing.

Make sure the pages you want people to see are indexed by Google. The first two chapters discuss crawling and indexing, and that is the purpose.

You can view the visibility report in Site Audit to find pages that cannot be indexed and their reasons. This report is free in Ahrefs Webmaster Tools.

During the operation of the website, its URL is often changed. In many cases, these old URLs contain links from other websites. If they are not redirected to the current page, then these links will be lost and will no longer count towards your page. Redirecting can quickly recover lost links. This is also a quick trick to gain links.

Site Explorer -> yourdomain.com -> Pages -> Best by Links -> add a “404 not found” HTTP response filter. I usually sort this by “Referring Domains”.

Site Explorer (Website Analysis) -> Your Domain -> Page -> Best by Links (Sorted by Backlink Count) -> Add '404 not found' HTTP response filter. I usually sort by Referring Domains.

This is the result of testing the 1800flowers.com website:

In archive.org, when viewing the first URL, I saw that this was previously about the Mother's Day page. By redirecting that page to the current version, you can recover 225 links from 59 different websites, and there are many similar cases on other pages.

你需要用 301 Redirect，将旧 URL 重定向到当前页面以收回丢失的权重。

Internal links are links from one page on your website to another page on your website. They help search engines find your pages and help pages rank better. We have a report called Link Opportunities in Site Audit that helps you quickly find these opportunities.

Schema markup is a code that helps search engines better understand your content and provides many features to help your site stand out in search results. Google's Search Gallery shows various search features and schemas required for a site to be eligible.

The elements we will discuss in this chapter are all worth paying attention to, but compared to the quick-win elements in the previous chapter, they may require more work and yield less. This does not mean you should not do them, but rather to help you understand how to prioritize your work.

These are secondary ranking factors, but for your users, you still want to look at this content. They cover aspects of the website that affect user experience (UX).

Core Web Vitals are speed metrics and are part of the page experience signals that Google uses to measure user experience. These metrics measure: Largest Contentful Paint (LCP), Cumulative Layout Shift (CLS), and First Input Delay (FID).

HTTPS protects the communication between your browser and server from being intercepted and tampered with by attackers. This provides confidentiality, integrity, and authentication for the vast majority of internet traffic today. You would prefer your pages to load via HTTPS rather than HTTP.

Any website that displays a lock icon in the address bar is using HTTPS.

In short, this will check whether the webpage displays correctly and can be easily used by people on mobile devices.

How do you know how friendly your website is to mobile devices? Check the Google Search Console for the "Mobile Usability" report.

This report will tell you whether the webpage has mobile-friendliness issues.

These checks are to ensure that the page does not contain deceptive content, does not include malware, and has no malicious downloads.

Insert Page Ads will prevent content from being seen. These pop-ups hinder users from reading the main page content.

Hreflang is an HTML attribute used to specify the language and geographical targeting of a webpage. If you have multiple versions of the same page in different languages, you can use the hreflang tag to inform search engines like Google about these variants. This helps them provide the correct version to users.

These tasks are unlikely to have a significant impact on your ranking, but they are generally good for improving user experience.

The broken links are links on your website that point to non-existent resources—these links can be internal (i.e., pointing to other pages within your website domain) or external (i.e., pointing to pages on other websites).

You can quickly find broken links on the website using the Site Audit (website diagnosis) link report. It is free in Ahrefs Webmaster Tools (Ahrefs webmaster tools).

A redirect chain is a series of redirects that occur between the initial URL and the target URL.

You can quickly find redirect chains using the Site Audit report in the website diagnostics. It is free in Ahrefs Webmaster Tools.

These tools can help you improve the technical aspects of your website's SEO.

Google Search Console is a free service provided by Google that helps you monitor your website's performance in search results and troubleshoot it.

Use it to find and fix technical errors, submit sitemaps, view structured data issues, etc.

Bing and Yandex also have their own tools, Ahrefs does too. Ahrefs Webmaster Tools is a free tool that helps you improve your website's SEO performance. It allows you to:

Monitor your website's SEO health
Check over 100 SEO issues
View all backlinks
See all the keywords you rank for
Understand how much traffic your pages are getting
Look for internal linking opportunities
This compensates for the limitations of Google Search Console.

Google's Mobile-Friendly Test can check how easy it is for visitors to use your page on mobile devices. It can also identify specific mobile usability issues, such as text being too small to read, using incompatible plugins, etc.

测试会显示 Google 在抓取页面时看到的内容。你还可以使用Rich Results Test来查看 Google 在你的桌面或移动设备上看到的内容。

Chrome Developer Tools is the built-in web debugging tool of Chrome. Use it to debug page speed issues, improve web rendering performance, etc.

From a technical SEO perspective, it has endless uses.

Ahrefs SEO Toolbar (Ahrefs SEO工具栏) is a free extension that supports Chrome and Firefox and provides useful SEO data about the pages and websites you visit.

Its free features are:

Page SEO Report
HTTP Header Redirect Tracker
Broken Link Checker
Link Highlight
SERP Ranking

In addition, as an Ahrefs user, you can get:

SEO metrics of every website and page you visit and Google search results
Keyword metrics in SERP, such as search volume and keyword difficulty
SERP results report export

PageSpeed Insights analyzes the loading speed of web pages. In addition to the performance score, it also provides actionable suggestions to speed up page loading.

All of this is just the surface of technical SEO. This should help you understand the basics, and many sections have other links for you to dive deeper. There are topics not covered in this guide, so if you want to learn more, we have also prepared a list for you.

Specific Focus

Infrastructure-related

Website related

Process

Other

Enjoy the fun of exploration and learning. If you have any questions, you can find me on Twitter.

Translator, Park Cheng, co-founder of Mobayke Ke Fan.