150k Small Business Website Teardown 2019

Wednesday June 26th

Welcome to the 2019 Small Business Website Teardown Report from Fresh Chalk. I will show you exactly how small businesses excel at creating websites… and how they don’t! I set out to understand which website platforms are the best (and the worst), what drives really fast websites, the state of small business SEO, and how well this stuff correlates with Google SERPs. I’ll even cover everyone’s hot button question - Squarespace, Wix or Wordpress?

My analysis turned up so many unbelievable findings that I had to break the report up into a series of blog posts. I’ll publish a new chapter every week or two, starting with Part One, a primer on small business SEO. Here are six of the astounding findings we uncovered that are seriously impacting small business websites.

Key Findings

  • Nearly 2% of small business websites are marked noindex. Blame Squarespace.
  • 25% of all small business websites are missing an H1 tag.
  • Sites that mention the BBB outrank those that without a mention by a median 18%.
  • Yelp appears in the top five search results for 92% of Google web queries that consist of a city and business category.
  • Businesses with 4 or more stars on Google My Business outrank those with less than 4 stars by a median 11%.
  • Novelty stat: sites that mention faith outrank those that don’t mention faith by a median 12%.

The Report is Ongoing

I’ll publish the next chapter in a few weeks. If you’ve got feedback, I'm @adamdoppelt. Let me know if there’s more you’d like to see from this report.

Published

1. SEO (this chapter)
2. Website Builders, Speed and Google Rank
3. WordPress

On Deck

4. The Verticals - Real Estate, Auto, Vision, Chiropractors...
5. Social
6. Technical Factors

Why?

Let me begin by explaining why someone would devote countless hours fetching and analyzing hundreds of thousands of small business websites. I’m one of the founders of Fresh Chalk, which helps consumers find local professionals with a little help from their friends. If you need a dentist, electrician, or hair stylist you can find one on Fresh Chalk.

Though our company is young, we already list hundreds of thousands of small businesses in our first few cities. We wanted to make sure we had contact emails for each of these businesses, which led my cofounder Patrick to start crawling these sites. Then we had a thought - wouldn’t it be interesting to analyze these websites to see how well they actually served their businesses? After all, as a team that’s focused on connecting customers with great businesses, we want to know just how well small businesses were doing at marketing themselves.

This little idea quickly ballooned into a project of mammoth proportions. We gathered three hundred data points for 150,000 websites, or roughly 1% of US small businesses. I also grabbed PageSpeed scores and SERPs for each and every one. I brushed off my dusty stats textbook to separate the wheat from the chaff. See the Methodology section below for more details.

Now I can reveal how SMB websites are built, easy ways to make them perform better, and how to give yourself a leg up in the battle for search engine rankings.

SEO Basics

This chapter of the report addresses SEO. If you aren’t an SEO insider, some of these terms will be unfamiliar to you. You may want to consult Moz’s Beginner's Guide to SEO by Britney Muller before proceeding.

There are many different ways that consumers find businesses, but all roads lead to Google. Let’s start with a quick check of basic on-page SEO factors:

This chart summarizes how many small business sites are employing some kind of SEO must-have. I’m heartened to see that nearly 100% of these SMB websites have a title tag, but surprised that less than 75% have an <h1> tag.

This chart chronicles the most egregious and fundamental SEO failures. The lack of sitemaps can be excused, but a decent meta description is hugely important for increasing click through rate. Later in the report you’ll see exactly the kind of Google boost correlates with adding a meta description.

"these findings make me sad"
- an SEO friend

Your Noindex is Showing

Here’s a surefire way to slowly destroy your business: tell Google not to index your website! A noindex tag is an explicit message to Google saying “please don’t index my website.” It should be exceptionally hard to accidentally include a noindex tag. Yet nearly 2% of SMB websites are marked noindex.

What does this mean for potential customers checking out a business? It means getting a result that looks like the screenshot below. This is a scary result, and no one is going to click it.

Update: Roey pointed out that this type of Google search result is displayed if a page is blocked by robots.txt (not noindex). Noindex should still be avoided, obviously, but the screenshot above is incorrect.

Who’s to blame here? Squarespace, mostly.

It looks like it’s way too easy to check “Hide this page from search results” on Squarespace. What are they thinking?

Hosting is a complicated topic and this chart glosses over some details. For example, GoDaddy has a website builder but also provides managed Wordpress. I’ll be directly addressing the various SMB hosting options, CMSs and agencies in part two of this report. The wild world of Wordpress will be covered in part three.

The Case of the Missing H1

Let’s dig into those numbers a bit more. Which hosting providers make it easy to have great on-page SEO? For the <h1> tag:

Clearly if you use Google Local’s website builder they make it easy to create an h1 tag with your business name. Not true for Weebly. I was having trouble coming to terms with the 17% number for Weebly, so I created my very own Weebly website and tried to create an h1 tag. I couldn’t figure it out! Apparently I’m not alone.

Who Owns the SERPs?

This little chart says a lot about the current state of SMB organic search. SERP is an industry term that stands for “Search Engine Results Pages”. I’ll show the chart, then break down the numbers.

I ran hundreds of Google searches for city and SMB category keywords in the US. For example, “Houston Home Inspector” or “Denver Eye Doctor”. I used the uule parameter to emulate local search in that city. Important caveat - city-wide uule searches don’t mirror the hyper localness of actual searches, so these results are good for directionality but don’t take them as gospel. The list of top directories is also dependent on our category list. For example, you won’t see Avvo in this chart because we didn’t examine SERPs for legal searches. See Methodology for more details.

Approximately 1 in 4 Google search results matched one of our 150,000 businesses. The rest of the search results were typically directories, news sites, job postings, etc.

If you look at the top five organic search results in Google, which directories appear the most? Yelp is the clear winner, appearing in the top five search results for a whopping 92% of all SMB category searches. This is an astonishing number. Home Advisor and Angie’s List (both owned by IAC) collectively appear in 36% of searches. Thumbtack may be able to justify their huge round by pointing to their admirable 9% number. Perhaps most surprising is Facebook ranking at all given their lack of directory pages.

Yelp’s dominance is tempered somewhat by Google’s continual efforts to drive users to click on Google properties. Also see BrightLocal’s excellent Local Services Ads Click Study to learn where users actually click.

Still, this chart shows that there is room for innovation. There’s no reason that Indeed should be ranking on this list. One day soon, I hope to see Fresh Chalk on this chart.

Yelp Dominates, but Not Everywhere

We’ve established that Yelp rules small business search results. As shown above, Yelp appears in the top five search results for SMB searches 92% of the time. But what if we dig a bit deeper and look at Yelp performance in the top three search results, on a per-city basis?

Turns out that Yelp is crushing it in Seattle and Austin, but less so in cities like Atlanta and Indianapolis. Yelp isn’t quite as dominant in those cities, creating a vacuum for other participants to fill.

One Weird Trick to Raise Your Google Rank by 17%

Remember that time your drunk SEO friend suggested that you add a meta description to your new website? Turns out that was some pretty good advice:

By running hundreds of generic Google SMB searches and matching them against our huge list of SMB websites, we can quantify something that seems fairly intuitive: SMB websites with a meta description rank 17% higher than websites without. Now, Google stated long ago that meta description is not a ranking factor. However, a decent meta description will increase CTR from the search results, which will ultimately boost your Google Rank.

How about this one?

During our scan I took note of SMB websites that mentioned the Better Business Bureau, either with plain text or a link. Websites which contain a reference to the BBB rank 18% higher than sites without.

Star Power

Google My Business is the Google feature where local businesses can add their contact info and photos to Google, and users can write reviews. With the rise of Google My Business, Google has overtaken Yelp to become the most important directory for small businesses.

This gradual encroachment has been accompanied by justified hand wringing from competitors and our government’s under appreciated antitrust bureaucrats. Let’s see how Google My Business reviews correlate with Google web search rankings for those businesses. I only looked at businesses with more than 10 reviews. I expected a strong trend and I wasn’t disappointed:

Important caveat - only around 2% of businesses are hated enough to sink into 1-3 star land, so the data is sparse on that end. If you are interested in this topic, check out BrightLocal’s Google Reviews study. Note that the BrightLocal study is looking at local search (not web search) but it’s still fascinating stuff.

What about Yelp? Do we think Yelp stars could be a ranking factor for Google? The answer might surprise you:

I was sorta thinking we’d find something here based on all the noise from Yelp, but there is no correlation whatsoever! I’ll tell you what does correlate, though. Number of Yelp reviews:

Don’t get too excited. One of Google’s main ranking factors is “age of domain”, and my guess is that number of reviews correlates directly with the age of the business. That’s probably what we’re seeing in this chart. I doubt that Google is actually using number of Yelp reviews as a ranking factor.

Darren suggested a different reason - increasing number of reviews probably correlates with increased prominence in Yelp search results, which in turn leads to Yelp domain authority passing to your site/business.

Schema Tags

Schema tags improve the way search engines understand and display your website in SERPs. These tags are super popular, and 48% of all SMB websites contain a schema tag of some sort. Let’s take a look at the most common options.

Of the websites that contain schema tags, 25% contain a WebSite tag and 19% contain a WebPage tag. I’m sure we all remember the difference between the two, right?

Personally I’m interested in AggregateRating. Only 2% of SMB websites attempted to use this tag, probably because (as Joy helpfully pointed out) Google discourages AggregrateRating on a website’s home page. Actually, that raises an interesting question that I can now answer. Which tags correlate with the largest boost in Google rank?

The AggregateRating schema tag has a strong correlation with improved Google rankings. I bet this is because AggregateRating and it’s sibling tag, Rating, tend to get added when websites subject themselves to the tender mercies of SEO professionals.

External Links and Dead Ends

Rand suggested I look at small business websites that contain a link to an external website versus those that don’t. I call this the Dead End metric. My first response was skepticism - who builds websites without links? I was wrong, though. It turns out that 5% of small business websites are dead ends, and it correlates strongly with poor rankings:

Have Faith

There are many other unexpected correlations. For example, it turns out that if you mention faith in some fashion on your website, you tend to rank better than other SMB websites:

“Faith” means that the website mentions church, God, the Lord, etc. These words are surprisingly common in the data set. Remember, correlation does not equal causation!

Methodology

A large array of technology was used to create this report. What originally started as a set of scripts and a bloated sqlite file quickly evolved into a full blown Rails app with a cloud instance, a large Postgres instance, and a significant disk cache for crawled content. I used a nice fat machine so that most of the crawl cache would fit into memory.

The crawl cache is 30gb of data, with another half gig for SERPs. The Postgres data is 100mb, with 250k rows in the main table. The crawl was executed with homegrown Ruby code driving Curl. Three proxies were used for the SERPs. Google PageSpeed data was fetched slowly due to API limits imposed by Google. I also tracked timing data produced by Curl. It’s very easy to parallelize crawls using the excellent parallel gem.

I collected roughly 300 data points for each website. The vast majority of the data points use a simple string search. For example, if the web page contains “olark” I assume they use Olark Live Chat. Some checks require a regex search, like the one for meta description. Crawl responses are forced into utf8 and converted to lowercase for convenience.

The list of SMB websites was extracted from Fresh Chalk. I also kept track of the business cities, categories, contact info, and star ratings so I could analyze the data later.

A set of scripts drive the crawling, parsing, PageSpeed fetching, and reporting. I know from experience with other crawling projects that it pays to think of this process as a pipeline, with each step transforming and caching the data. As my needs evolve I can tweak each stage of the pipeline and run the appropriate script again.

The reports emerge as CSVs. I preview the CSVs with xsv, commit them, and then import into Google Sheets for charting and analysis.

actual screenshot

I use rsync to occasionally pull the production crawl onto my local machine. It’s faster to iterate when everything is local. I’d also like to give a shout out to the amazing and little known codesearch tool from Russ Cox. I don’t use codesearch in my data pipeline, but it’s great for running quick checks against the full crawl. I started with ripgrep and then quickly moved to codesearch as the dataset grew.

The Process

The list of SMB websites was pulled from Fresh Chalk, which at the time was quietly available in Atlanta, Austin, Boston, Denver, Houston, Indianapolis, Portland, and Seattle. This resulted in a list of 250k SMB websites. After eliminating dead and parked websites that number dropped to 200k. Many franchises have multiple businesses on a single website, so I carefully arranged to sample each individual website only once. That winnowed the data set down to 150k websites.

I added incremental data points by seeking out interesting words or urls in the crawl data. Strings that cropped up often were candidates for inclusion in the report. Most of the data points were collected using string/regex searches as detailed in the Technology section. This has limitations with false positives, but I was careful to structure the searches appropriately. For example, searching for <h1 instead of just h1 will produce a more accurate measurement of h1 tags.

Wordpress plugins were determined by searching for wp-content/plugins, with additional checks for popular plugins like woo, yoast, etc.

Figuring out the host and/or CMS was an important part of this project. Most providers left clues in the HTML which I was able to easily suss out. If one (and only one) of these clues is present, I assume the website is using a particular host/CMS. I also spent a ton of time analyzing “powered by” phrases. I didn’t have time for DNS, MX or whois. Maybe next time.

I ran 500 Google searches and looked for business websites in the top 100 results. Each search consisted of a city name and business category like “Boston Acupuncture”. Based on Darren’s excellent suggestion, I used the uule parameter to localize the search to that city. This won’t give hyperlocal results, but it’s good enough for general ranking clues. I was able to match around 11k business websites from the SERPs.

We have around 75 primary business categories on Fresh Chalk. We cover most of the SMB world, but we haven’t gotten around to everybody yet. For example, we don’t have a lot of lawyers on Fresh Chalk. That’s also why you won’t find Avvo in our list of top 3rd party directories - we didn’t search for lawyers.

Correlation Does Not Equal Causation

An important caveat - most of my conclusions are correlation, not causation. Correlation does not equal causation:

Thanks

Congrats on making it to the end of this long winded and meandering post! Hopefully you managed to learn something.

A few patient friends helped guide the report by asking pointed questions and reading drafts. I’d like to send a big thanks to Dana DiTomaso, Darren Shaw, David Mihm, Joy Hawkins, Laura Troyani, Rand Fishkin, Rob Ousbey, and Russ Jones for their assistance producing the report. To quote my D&D group, the real treasure was the friends we made along the way.

The Report is Ongoing

I’ll publish the next chapter in a few weeks. If you’ve got feedback, I'm @adamdoppelt. Let me know if there’s more you’d like to see from this report.

Published

1. SEO (this chapter)
2. Website Builders, Speed and Google Rank
3. WordPress

On Deck

4. The Verticals - Real Estate, Auto, Vision, Chiropractors...
5. Social
6. Technical Factors

At Fresh Chalk, we believe the best recommendations come from friends. Our mission is to help you find the best products and professionals for home, health, beauty, and more — through your personal network — as quickly as possible. Learn More
Join Fresh Chalk About Us