What are CAPTCHAs and why do we need them?

With the proliferation of the internet came the menace of bots. CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) were introduced as a countermeasure to differentiate between genuine human users and automated scripts (bots). This article explores how CAPTCHAs work, the difficulties in designing an effective CAPTCHA, and how they must evolve to stay one step ahead of bots while protecting our privacy.

What do CAPTCHAs protect us from?
Types of CAPTCHAs
How do CAPTCHAs work?
Criticisms
The future of CAPTCHAs
Summary

What do CAPTCHAs protect us from?

CAPTCHAs are a vital part of internet security, protecting against:

Brute force attacks: Without CAPTCHAs, bots can repeatedly attempt to log in to websites, cycling through countless combinations of usernames and passwords until they gain access.
Form spam: Bots can submit forms on websites, such as contact forms or comment sections, with spam content. CAPTCHAs prevent this by requiring a human-like interaction before form submission.
Web scraping: Some bots are designed to scrape or steal content from websites. CAPTCHAs can deter these bots, protecting website content.
Preventing fake sign-ups and account creation: On many platforms, creating accounts en masse is beneficial for spamming or other malicious activities. CAPTCHAs ensure that every account creation requires a human verification step, making mass account creation inefficient for malicious actors.
Protecting application resources: Bots can repeatedly access a website, consuming significant server resources and slowing down the site or even causing it to crash. By serving as a first line of defense, CAPTCHAs help ensure that only genuine users consume these resources.

CAPTCHAs serve as gatekeepers on the web, filtering out automated threats while allowing genuine human users to proceed. They play an indispensable role in protecting online platforms from many potential threats and abuses.

Types of CAPTCHAs

There are many types of CAPTCHAs, each relying on a specific interaction that’s easy for a human to perform but nearly impossible for a bot.

Text-based CAPTCHAs: These display distorted letters and numbers you must identify and type out. The distortions are made in a way that machines find hard to recognize, but humans can decipher with relative ease.
Image CAPTCHAs: You are presented with a series of images and asked to select those that match a specific description (for example, “Select all images with traffic lights”).
Math CAPTCHAs: These show simple math problems that you have to solve, like basic addition or subtraction.
Time CAPTCHAs: These challenges are as simple as reading the time on an analog clock.
Interactive CAPTCHAs: Tasks like dragging and dropping items or following a simple instruction, for example, “Slide to the right.”.
Behavioral CAPTCHAs: These look at behavior such as mouse movement and past activity to detect bot-like behavior from page load.

In addition, audio CAPTCHAs are typically provided to help the visually impaired solve the challenge. You listen to a series of spoken letters or numbers and then type them out.

Moreover, a relatively new type of CAPTCHA has been developed in recent years called a “cryptographic CAPTCHA” where some basic computational challenge can be solved, also known as proof of work. With such a mechanism, the browser is given some challenges of adjustable difficulty to solve. The browser must provide an answer before it can proceed.

For example, the leading zero challenge requires your computer to find an input value that, when hashed, produces an output with a specific number of leading zeros.

Recently, mCaptcha(nova janela) and Friendly Captcha(nova janela) have emerged in this space. However, relying only on computational challenges is a risky strategy. While these challenges are unintrusive, they depend on your device’s computing ability. If your device is too slow, the user experience can be suboptimal, as you might have to wait many seconds for the challenges to complete. On the other hand, powerful servers used by a spammer would have no difficulty solving these challenges relatively quickly.

This illustrates the conundrum posed by CAPTCHAs: developers need to devise challenges that are difficult and costly for attackers while still being relatively simple for ordinary users.

How do CAPTCHAs work

Irrespective of the type of CAPTCHA being served, one universal truth holds true: The front-end client can know nothing about the solution to the CAPTCHA itself — otherwise it would be too easy for an automated solver. A server will typically send a challenge, the front end will provide the mechanism to enter the answer, and a server will validate the answer from the client.

For example, the overall flow for a text-based CAPTCHA will look something like this:

Automated solvers(nova janela) can step in here and translate the CAPTCHA image the server generates to text input. This is why text-based CAPTCHAs have evolved to have more and more difficult patterns to solve, often making it difficult even for humans and inaccessible to those who are visually impaired.

Criticisms

While CAPTCHAs are crucial for online security, they are not without criticism.

Some say that CAPTCHAs are of little use in this era with machine learning and human solver services acting as a bridge between automated bots and CAPTCHA-protected websites. Such solver services employ real people to manually solve CAPTCHAs that a computer finds hard to decipher.

However such critics neglect the fact that the CAPTCHAs are in fact still doing their job by making it harder for attackers to spam a service. Even if a CAPTCHA can’t entirely prevent bots from spamming a service, it does make it much more difficult and that is oftentimes sufficient for the reduction of abuse.

Others argue CAPTCHAs can hamper user experience, especially if they’re too challenging. There are also accessibility concerns, since some CAPTCHAs can be difficult for users with visual impairments. As is often the case, it’s a balance between ensuring your privacy, protecting your security, and offering a user-friendly experience.

In recent times, initiatives have focused on maximizing user experience at the expense of privacy by using browsing history to determine if you’re an authentic human being or a bot. A real person is likely to have activity on many different websites over the course of a day and has likely hit CAPTCHA systems before. This history gives systems such as hCAPTCHA or reCAPTCHA the ability to determine whether online behavior is authentic or not before you even load a page. This is why you just often simply click a checkbox instead of solving a real challenge.

While convenient, these challenges often compromise your privacy. These services inevitably know your browsing behavior and the sites you’ve visited which is a concern.

On the other hand, systems such as mCaptcha and Friendly Captcha offer more privacy but compromise security since proof-of-work systems only add a cost to the action and typically will not be effective at preventing bots from accessing your site or posting spam.

The future of CAPTCHAs

The landscape of machine learning and AI is rapidly evolving. What is challenging for computers today might become trivial tomorrow as models become more sophisticated.

We clearly need to move towards usable CAPTCHA systems that respect user privacy, and secure sites from the majority of bot and spam activity.

First we can help improve usability and accessibility by minimizing the number of CAPTCHAs for real users across the web. This is possible by leveraging protocols such as Privacy Pass(nova janela) which allow good users who have already completed CAPTCHA elsewhere to skip a CAPTCHA on another website for instance. This is performed without knowledge of previously visited websites. So why is it not more popular? Unfortunately current implementations of this protocol require the use of browser extensions which are not available across all browsers, and not installable by all users.

Second, we need systems that counter the threat of AI whilst not making CAPTCHAs prohibitively hard for humans. A recent paper by Searles et al, 2023(nova janela) showed that bots are already more accurate than humans in solving many of the leading CAPTCHA systems.

A solution resides in thinking of CAPTCHA challenges that could exploit the current limitations of AI such as “Contextual” CAPTCHAs that are difficult for machines to solve (as of the time this article was written) since they require world knowledge and complex systems to decode. They therefore present opportunities to stay ahead of this ever-evolving cat-and-mouse game. Their difficulty resides in the following properties:

They require multi-modal reasoning to solve: The CAPTCHA isn’t just about object recognition. It combines object recognition with contextual reasoning.

They are dynamic and varied: There can be numerous variations of questions and image combinations, making it difficult for a model to train specifically against such CAPTCHAs.

They tap into world knowledge: This approach leans on general world knowledge and common sense, areas where machines can still falter compared to humans.

For example, instead of using images, CAPTCHAs could create simple interactive challenges that require reasoning. For example, “Drag the moon below the cloud” on a canvas where various objects (like stars, sun, birds, etc.) are present. Or, in the example above for the interactive CAPTCHA, you’d need to drag milk to the fridge.

Another example is to present a very short story (a few lines) and ask a question based on it. For example:

Question: Andy went to the orchard and picked 3 apples. He ate 1 and gave 2 away.

Ask: “How many apples did Andy pick?” or “How many apples did Andy eat?”

Or one can show a series of images and ask questions based on common sense or contextual knowledge. For example, you could show the following images:

And then ask, “Which one can typically speak when they grow up?”

Humans can easily recognize the answer is the baby. But recognizing the context and the commonsense reasoning can be challenging tasks for a machine, even if it identifies all objects correctly.

Contextual CAPTCHAs are a very interesting future research area, but they also present several challenges:

Cultural bias: What’s considered “common sense” in one culture might be unfamiliar in another.
Language: For story-based CAPTCHAs, one needs to invest time to ensure that content is internationalized for non-English speakers. For example, in the “Drag the milk to the fridge” challenge, MILK would need to be translated to other languages to be utilized across the world.
Challenge generation: Creating challenges like these is not a trivial problem, and better solutions may require significant investments in time to get right.

In addition, it’s inevitable that someday, AI systems capable of multi-modal reasoning will be developed, and they may be better at solving these types of CAPTCHAs.

Summary

CAPTCHA systems continue to serve as a vital frontline defense against bot activity and spam attacks across the internet. While their presence is ubiquitous and enduring, the landscape is evolving. Recent breakthroughs in machine learning technologies, coupled with the emergence of CAPTCHA-solving services that employ human solvers, have begun to erode traditional CAPTCHA systems’ efficacy. Nonetheless, most attacks websites face are less sophisticated, and CAPTCHA systems remain a highly effective barrier.

As we move forward, the challenge lies not only in maintaining the robustness of CAPTCHA systems against increasingly sophisticated attacks but also in ensuring these systems are user-friendly, accessible to individuals with disabilities, respectful of user privacy, and free of undue friction or inconvenience for genuine users.

The future of CAPTCHA, therefore, calls for thoughtful innovation and rigorous research. It demands the development of new systems that can adeptly balance security with usability, accessibility, and privacy — embracing a holistic approach that evolves in tandem with the shifting tactics of malicious actors. As the digital world continues to grow and transform, CAPTCHA systems will undoubtedly need to adapt and innovate to maintain their role as a cornerstone of online security.