We can’t be the only ones to have noticed that every time we have to complete a CAPTCHA these days, it usually involves identifying road signs or traffic lights.

All things that the AI behind self driving cars would need to recognise.

And given that Machine Learning typically requires a stack of human verified data to train the model, could Google secretly be using all of us to help build the software behind its self driving cars?

Select all square with traffic lights. Is Google using us to train its self driving cars?

Wait. What's a CAPTCHA Again?

The CAPTCHA (or Completely Automated Public Turing Test to tell Computers and Humans Apart) has been around for almost 20 years, protecting comment forms and logins across the web from malicious bots trying to post spam or break into your account. While the earliest versions just involved picking out a few squiggly letters on a fuzzy background (something that is usually very easy for humans to do, but was much harder for bots to do at the time), that all changed in 2007 when Luis von Ahn of Carnegie Mellon University invented reCAPTCHA.

The idea was to take advantage of the otherwise wasted global human effort expended interpreting distorted letters and use it for good: digitising books. While Optical Character Recognition could already do a reasonable job of recognising the text in old books, there were still many words it just couldn't figure out. reCAPTCHA was an ingenious solution to the problem.

Those earliest reCAPTCHAs worked by showing you two words, one of which was already known and one that had not been identified. If enough humans gave the same response to the unknown word, that would become the accepted answer.

Within a few months, millions of people around the world helped reCAPTCHA to successfully digitise 20 years of the New York Times archive, with humans deciphering over 440 million words in the process. Google purchased reCAPTCHA in 2009 and over the next few years it helped digitise thousands of books and build up the Google Books library.

A few years later, people started to notice that one of the words in the reCAPTCHAs they were solving had been replaced by what looked very much like numbers from buildings in Google Street View.

Street View CAPTCHA Circa 2012

In news that came as a surprise to no one, that’s exactly what they were:

We’re currently running an experiment in which characters from Street View images are appearing in CAPTCHAs. We often extract data such as street names and traffic signs from Street View imagery to improve Google Maps with useful information like business addresses and locations.

Based on the data and results of these reCAPTCHA tests, we’ll determine if using imagery might also be an effective way to further refine our tools for fighting machine and bot-related abuse online.

Google Spokesperson (2012)

But times change, and reCAPTCHA has continued to evolve. That’s partly because, inevitably, the bots got better. As Artificial Intelligence comes on in leaps and bounds, developing tests that are easy for humans but hard for computers is an ongoing arms race, with the computers constantly catching up (and overtaking us). In 2014, Google tested one of its machine learning algorithms on some of the most distorted text CAPTCHAs. Humans got the test right 33% of the time. The bots scored 99.8%.

Google’s latest version of reCAPTCHA uses various cues about your behaviour, including how you move your cursor around the screen. In some cases you might not see a CAPTCHA box on screen at all. Or you might only need to tick that cute little box marked “I’m not a robot”...

I’m Not A Robot

Or, in some cases, you might have to tick some boxes to identify various items in an image. Which brings us back to clicking on traffic lights, stop signs, and other things that you’d typically see while driving.

So, is Google using us to train its self-driving cars? Well, Google does admit on its reCAPTCHA developer website that: “reCAPTCHA makes positive use of this human effort by channelling the time spent solving CAPTCHAs into digitising text, annotating images, and building machine learning datasets”.

We’ll never know for sure, but it seems highly likely given the previous history that the datapoints generated by humans solving CAPTCHAs are being used in some way to power the future of self driving cars.

Something to remember in a decade or so when self driving cars are ubiquitous. In some small way, we all helped make it happen.

Well. Most of us did...

Thanks for Stopping By...

We're WingArc Australia. We make software for exploring big data sets and unlocking insights. Want to know more?

Matt Armstrong

View posts by Matt Armstrong
With over two decades' experience in the technology industry, Matt is WingArc Australia's manager of marketing and communications.
Scroll to top