The 4TB time bomb: when EY's cloud went public (and what it taught us)

Let's deep dive into cloud misconfigurations, attack surface management, and why responsible disclosure matters

Terminal showing 4TB SQL-server backup discovery
|7 min read
Cloud securityResponsible disclosureAttack Surface Management

Here at Neo Security, we don't just "scan." We practice a form of digital cartography.
The modern internet isn't a fixed map; it's a constantly shifting, fluid landscape of assets, relationships, and data.
Together with our partners, we map it, understand it, and find the parts that organizations have forgotten they own.

During one of these recent mapping expeditions, our lead researcher found something that made him stop and double-check his work.

Our engineers have real incident response experience. Worked on breaches where attackers found their way in through database files that were briefly exposed. We know the scenario well: a .BAK file leaked for five minutes. An exposure window measured in seconds. That's all it takes.

We know what the theoretical looks like when it becomes real.

The moment

One of our hackers wasn't running a broad, noisy scan. No. Instead, he was doing focused, low-level tooling work: tweaking passive data sources, stepping through raw network traffic. Hours in, staring at output buffers, something caught his attention.

A 200 OK... on a HEAD request.

For context:

a HEAD request is like knocking on a door and asking, "Who's in there, and how big is the room?" You're asking a server for metadata about a file (its size, type, last-modified date) without actually downloading it. It's supposed to be a fast, harmless query.

The server's answer was anything but harmless.

Content-Length: 4... terabytes?

He sat back.

Four terabytes. That's not a file. That's a data center. That's the entire collection of the British Library. That's massive.

He stared at the screen. The file names looked exactly like SQL Server backup files. His mind went to the obvious: if this is what he thinks it is, it's bad.

What is a SQL Server BAK file. Why did he take this seriously?

An SQL Server BAK file is a complete database backup. It contains everything: the schema, all the data, stored procedures, and critically, every secret stored in those tables.
API keys, session tokens, user credentials, cached authentication tokens, service account passwords. Whatever the application stored in the database.
Not just one secret... all the secrets.

Finding a 4TB SQL backup exposed to the public internet is like finding the master blueprint and the physical keys to a vault, just sitting there. With a note that says "free to a good home."

He'd investigated breaches that started with less. Way less. He once traced an entire ransomware incident back to a single web.config file that leaked a connection string. That was 8 kilobytes.

This was four terabytes.

The investigation process

He Googled the bucket and file specifics. A few unrelated results came up, but nothing clicked immediately. No obvious website. Someone was paying for that Azure subscription though. This was live.

Trying to confirm ownership can be hard. He started digging. Company name searches led to business merger documents. In a south-central European language. He fed them through DeepL. The translation revealed the company was acquired in 2020 by a larger entity, but the parent company name wasn't immediately obvious.

Then he ran an SOA record lookup. A "Start of Authority" DNS query, basically asking the internet's phonebook "who's really in charge of this domain?" The response came back pointing to an authoritative DNS server: ey.com.

That's when everything clicked.

His stomach sank.

This wasn't some startup. This was Ernst & Young.
One of the Big Four accounting firms. Global. Massive.
The kind of organization that audits major corporations, handles M&A due diligence for multi-billion euro deals, and has access to financial records that could move markets.

But he still had to be sure before reaching out.

He couldn't download 4 terabytes, that's not research, that's a felony. So he did what an engineer would do: he downloaded the first thousand bytes.

The bottle was half full

Most file types have a "signature" in the start of the file, the very first few bytes. It's a digital fingerprint.
A PDF starts with %PDF-.
A ZIP with PK....
A JPEG with FF D8 FF.
An ELF binary with 7F 45 4C 46.

These "magic bytes" are how the file command on Unix systems works. It doesn't look at the file extension (which anyone can fake). It looks at the actual bytes.

He parsed the magic bytes. sql-server backup. Confirmed. This wasn't encrypted. The file format was unmistakably a database backup.

And that meant this was as bad as he thought.

He sat back and exhaled. He'd been here before, not personally discovering something this big, but cleaning up after someone else had. He remembered one incident in particular.

The context

A few years ago, he'd been called in to investigate a breach at a fintech company. They'd been hit with a ransomware attack with complete access to their customer database, internal tools, and cloud infrastructure.

The timeline investigation revealed the entry point: a .BAK file (a full SQL database backup) that had been publicly accessible in an Azure storage bucket. For exactly five minutes.

An engineer, working late under pressure, needed to migrate a database between environments. The VPN was flaky. The firewall rules were complex. So he made a decision: "I'll just set the bucket ACL to public for two minutes. I'll download it, then set it right back. No one will notice. What could possibly happen in two minutes?"

Modern cloud platforms make it trivially easy to export and backup your database. A few clicks, select your database, choose a destination bucket, and you're done. The export happens automatically in the background.

But here's where it gets dangerous: one wrong click, one typo in a bucket name, and suddenly your private data is sitting in a public bucket. You meant to export to company-internal-backups but accidentally typed company-public-assets. Or you created a new bucket for the export, forgot to set it to private, and the cloud provider defaults to public. Oops.

How easy it is to export a database to cloud storage - one wrong bucket selection away from a leak

It's that easy to accidentally leak terabytes of sensitive data. The tools are designed for convenience, not security. They assume you know what you're doing. They don't warn you that you just exported your entire customer database to a bucket that's readable by anyone on the internet.

How fast exposed data is discovered after making it public

What's an ACL?

An Access Control List (ACL) is the access control for your cloud data. It's a list of rules that determines who can access what:

  • Rule 1 (Private):"Only Bob from accounting, connecting from this specific IP address, with this specific authentication token, can get in."
  • Rule 2 (Public):"ANYONE. From anywhere. No credentials. No questions asked. Help yourself."

That engineer flipped the bouncer's list from Rule 1 to Rule 2. "Just for a second. I'm not dealing with VPC whitelisting and ISO docs tonight."

Boom.

What he didn't know is that attackers don't just casually scan. They deploy thousands of automated scanners across every corner of the internet.
Compromised IoT device? Botnet. Hacked home router? Botnet. Pwned cloud instance? Botnet.

This distributed scanning infrastructure doesn't browse casually.
They sweep the entire IPv4 space (that's 4.3 billion addresses) in minutes.
They're massively parallel, geographically distributed, hyper-optimized for one thing: finding exposed data.

It's an automated gold rush. A constant race to find the next open S3 bucket, the next public Azure blob, the next misconfigured GCS bucket. The window between "misconfigured" and "exfiltrated" isn't measured in hours or minutes. It's measured in seconds.

In that fintech breach, the engineer changed it back to private at the 5-minute mark, thinking he was safe. He wasn't. The entire database (PII, credentials, trade secrets) was already gone.

Here's the weird part: their homepage traffic spiked 400% during that window. Wonder why that is? Automated scrapers hitting every endpoint, probing every path, looking for more. Not humans browsing. Bots. Thousands of them.

Our researcher had watched that company go under. He was in the room when they made the breach notification to their customers. All because of five minutes.

Back to the present

So when he saw that 4TB SQL Server backup sitting there, publicly accessible, belonging to EY, he didn't think "interesting security finding." He thought about that fintech company. He thought about the timeline. He thought about the five-minute window.

But here's the thing: the question isn't even "which hacker took it?" The question is: "who didn't?"

That file was sitting there, publicly accessible, for an unknown amount of time. Could have been hours. Could have been days. In that window, with the scanning infrastructure that exists, it's not a question of if someone found it. It's a question of how many.

When something this big sits exposed on the public internet, you don't get to ask "did someone find it?" You have to assume everyone found it.

We immediately stopped all investigation. The clock was ticking. Every second that file was exposed was another chance for someone else to find it. Someone who wouldn't responsibly disclose.

The hard part: we scrambled to find a security@ mailbox, a vulnerability disclosure program, anything. Nothing. It was the weekend.

This is the uncomfortable reality of responsible disclosure. Our researcher went to LinkedIn and started cold-messaging people. "Hi, I'm a security researcher, I think I've found something critical, can you please get me to your security team?" After 15 attempts, he found someone who understood and connected him to the CSIRT.

The response

From that moment on? Textbook perfect. Professional acknowledgment. No defensiveness, no legal threats. Just: "Thank you. We're on it."

Clear, technical communication. Engineer to engineer. No jargon-filled corporate speak. Just solid incident response.

One week later, the issue was triaged and fully remediated.

A huge shout-out to EY's security team.

They handled it exactly as you'd hope. This is what mature security response looks like. And frankly, it's rare. We've had companies threaten us with lawsuits for telling them their database was public. We've had companies ghost us for months. We've had companies claim "it's not a bug, it's a feature."

EY? They just fixed it. No drama. No bullshit. Just professionalism.

Why the cloud can be a mess

Here's what concerns our researcher: if EY (with all their resources, security teams, compliance frameworks, ISO certifications, and Big Four budget) can have a 4TB SQL Server backup sitting publicly accessible on the internet, then anyone can.

The modern cloud is too complex. Too fast-moving. Too ephemeral.
Traditional security assessments can't keep up. You're not manually racking servers anymore. You're clicking buttons in a web UI, running Terraform scripts, deploying with CI/CD pipelines.
Infrastructure is code. Infrastructure is fast. And fast means mistakes happen at scale.

That 4TB file? It might have been exposed for an hour, a day, a week. We don't know. That fintech .BAK? Five minutes was enough.

The risk isn't some shadowy hacker specifically targeting you. The risk is the automation. The massive, distributed scanning infrastructure that never sleeps, never blinks, and finds everything within seconds of it being exposed.

You cannot defend what you do not know you own.

You need the same continuous, automated, adversarial visibility that the attackers have. You need to be the first to find your own 4TB SQL Server backup. You need to scan like they scan.

This is why Attack Surface Management isn't optional anymore.

~/security/disclosure/ey-4tb-leak

$ let's talk, and act.

We can help you look into your posture and perform an OSINT assessment to show you exactly what's visible from the outside, from an attacker's perspective. Not what your vulnerability scanner says. Not what your penetration test found. What's actually exposed on the public internet, right now.

Using the cloud but not 100% sure how many backups or sensitive files might be one "temporary" ACL change away from leaking? Deployed any database backups lately? Got snapshots floating around? Using Azure? AWS? GCP? All three?

Let's get on a call.

We're engineers, not salespeople. We'll skip the pitch and just show you the data. We'll help you find your exposures before someone else does.

Because our researcher has seen what happens when you don't. He's seen the five-minute leak. He's seen what happens when you ask "which hacker took it?" instead of "who didn't?"

So lets really find out what's exposed.