A $9B AI fail

8 min readNov 12, 2021

I’ve been teaching AI for 6 years. One of my favorite examples is house price predictions: I tell students that they need to develop an algorithm to predict the future sale price of a house and ask them to consider what features to use. In the end, I always tell my audience that this is not just an exercise, and there is real business value in this apparently simple machine learning task.

Specifically, the value of a good ML algorithm is $1.200.000. This is how much an American company paid to whoever developed the best algorithm for the task. The company in question is Zillow, an online real estate marketplace (basically eBay for homes)

Zillow bragged a lot about their AI algorithms. Every time a house is listed for sale, their AI suggests to the owner the right price sell it and homeowners have a quick reference they can use to price their home.

But Zillow got so confident of their algorithms, that they thought they could jump into a new business known as iBuying. Here’s how it works:

People propose Zillow itself to buy their homes
Zillow uses its algorithms to figure out if it’s a good deal
If that’s the case, Zillow buys the property right away
Zillow makes some simple renovations
Zillow sells it back for a profit

At some point, Zillow predicted it was going to buy 5,000 homes a month by 2024. That’s not happening anymore:

The company announced that its home-buying division, Offers, had * lost more than $300m over the last few months. Offers will now be shut down and about 2,000 people laid off. Zillow reportedly has about 7,000 homes that it now needs to unload; many for prices lower than it originally paid. — The Guardian

As a result, Zillow’s stock tumbled and lost roughly ~$9B in market cap.

What happened? Zillow’s CEO Rich Barton largely blamed the Data Science team. He said: “Fundamentally, we have been unable to predict future pricing of homes to a level of accuracy that makes this a safe business to be in.” So, basically, their AI wasn’t good enough.

I want to go a bit deeper. I think there are several reasons why the project failed.

1. The housing market is not a stable environment

House prices in the US have been going up for the past 10 years, and surged after the pandemic. It’s easy to make money in an environment where everything goes up, and it’s hard to spot inaccuracies on algorithms.

What happens if the trend changes (and it did)? You have to find it our fast, or…you lose $9B in market cap 🙂

As real estate tech strategist Mike Del Prete said:

Zillow missed the offramp. It’s like you’re driving on the highway and you see brake lights ahead of you. You take your foot off the gas, you pump the breaks. Zillow didn’t do either of those things until it was too late.

2. Supply chain issues made renovation slow

Let’s suppose you plan to renovate a house and flip it in 2 months. Your algorithms tell you that you’ll make money. But what if it takes 6 months instead? Will you still be able to sell at a profit? Who the hell knows. With the current supply chain shortages, it took Zillow more than expected to renovate homes, and therefore to sell them.

3. Adverse-selection

In Zillow’s process, people would not list homes on the platform, but directly offer them to Zillow. What kind of people would do that? Maybe people who need money fast, but maybe also people who know they have some skeleton in their closet and know it’ll be easier to sell the house to an algorithm rather than a human.

In an ideal situation, Zillow should have offered some good homes and some bad homes. Due to the nature of the business model, Zillow got just the bad ones.

4. AI performance isn’t business performance

Let’s suppose you have an average error rate in predicting the price of a house of 2%. This means that if the correct price of a house is $500.000, you may predict anything between $490.000 and $510.000.

You may think that on average you’re good: sometimes you overpay, sometimes you underpay. The problem is that the real world doesn’t reflect your fancy models. When you make a mistake down (you offer $490k rather than $500k), the homeowner won’t sell you the house (assuming he realizes the price is low). If your models shoot up and offer $505k instead, owners will definitely sell you their house, and you’ll overpay.

5. Algorithms don’t know everything…

When I lived in San Francisco I had an apartment in a neighborhood called Nob hill. Wikipedia says that Nob hill is “known for its numerous luxury hotels and historic mansions”. Nob hill is also right next to the Tenderloin, which Wikipedia reports having “among the highest levels of homelessness and crime in the city”.

That meant that literally moving 10 numbers down a street meant changing neighbor from a tech entrepreneur to a heroin addict.

Algorithms often approximate phenomena to curves, and that’s not always a good idea.

6. Bad data science?

You can’t judge a data science team from a job posting, but hey:

For non-data scientists: Prophet is an open-source library made by Facebook that promises you to “get a reasonable forecast on messy data with no manual effort”. I also drank the Facebook’s cool-aid and used prophet with my clients, just to realize that it’s good if you have a gun to your head and 10 minutes to make a model and show a nice graph to your CEO. But if you need to bet your company on a model’s success, stay away (or maybe start from there, and move past).

I find it interesting that the only tool mentioned in a job posting for a “Senior Data Scientist” is prophet. We can’t assume that their entire system relied on it, but it’s a warning of the dangers behind the productization of ML: fixing a business problem is much more than fitting a curve on some data.

Some reflections

I strongly believe that AI is one of the most powerful tools we ever had available. It can be used to cure diseases, free us from boring tasks, and if you’re a business you can also use it to make money!

But I never believed that this was going to be easy. I often say that AI is a tool, like a hammer. You can use a hammer to make a statue, build a home, or hit your thumb while you try putting a nail on a wall.

So don’t focus too much on how cool and shiny the hammer is. Focus on what you want to build with it, and please pay attention to how you hit that nail.

Update 04 Dec 2021

I wrote this article right after the Zillow crash. In the last couple of weeks, some really smart people chipped in and gave me some new perspectives. So here are two new points:

7. Don’t overfit your model to your corporate strategy

It turns out that Zillow’s models made them too much money, so they “fixed” them to make less money. Yes, you read that right.

According to a Wall Street’s Journal investigation, in the early days of its iBuying program, Zillow’s models were suggesting very low prices for homes. This meant that only 10% got accepted, but Zillow made tons of money from these offers.

That wasn’t Zillow’s corporate strategy. Zillow “expected to make money primarily from transaction fees and from services such as title insurance — not from making a killing on the flip”.

So even though the KPI of “return per transaction” was through the roof, that didn’t matter. What mattered for their strategy was “number of transactions”, and that wasn’t going well.

To fix the “problem” (if “making too much money” is a problem, I’d gladly have it) Zillow decided to overfit its models to its corporate strategy. This meant tweaking the algorithms to pay more for each house, sometimes even overwriting by hand the price recommendations.

And it worked! Zillow bought more than 3,800 homes in Q2 2019, more than double the previous quarter. In the third quarter, it bought 9,680 homes (yay, target met!). But also meant that they overpaid and net in a 6% average loss per transaction. In Phoenix, the median price Zillow paid for homes went from $351,000 in May to $475,000 in September.

This reminds me a lot of the lyrics of Metallica’s “King Nothing”: careful what you wish, you may regret it, careful what you wish, you just might get it.

8. A ML algorithm is not a business model

Zillow was (and still is) a very successful marketplace. A side-effect of being a successful marketplace is having lots of data. A classic question I hear from people with lots of data is “what else can I do with it?”. I understand how tempting it is to see data as a competitive advantage over companies in the home-buying business and decide to enter it as well.

The problem is that home-buying and risk underwriting are pretty different from showing house listings on a web page. As I tell execs considering to embark on a data science transformation, “old business + fancy algorithm ≠ new business”.

Steven Buccini wrote a great blog post on this issue (with the great title “Zillow did not have metallic balls”), and I highly encourage you to read the whole thing. But here’s a passage I think represents the problem pretty well:

“You only find out what works when you use the data that describes your own system, and that means processing a lot of transactions and bracing for impacts because a lot of the transactions are gonna go sour. One of the things that happen for a brand-new launched credit card: done right, you lose about 50% of the dollar volume in the first several months which is terrifying because it’s half the money, literally. […] The only way of building a successful anti-fraud and risk underwriting system is rigor and for lack of a better term, balls of steel.”

I see this issue with many Silicon Valley tech people: their “disrupt everything” mindset leads them to think that their knowledge of tech is a shortcut to success. But again, ML is just a tool. Your tool may be the shiniest in the industry, but if you don’t know the game you’re playing you’re just gonna smash your own finger.

Originally published at https://blog.gianlucamauro.com on November 12, 2021.