How To Check If Staff Emails Are in Data Breaches

Shutterstock/Kaspri

Are the login credentials of your staff on the dark web? We show you how to check whether their data has been caught up in a data breach.

Our Old Friend, the Password

The humble password is still the most common method of authenticating yourself to gain access to a computer or online account. Other systems exist and will continue to appear and evolve but right now, the password is ubiquitous.

The password is a child of the sixties. During the development of the Compatible Time-Sharing System (CTSS), computer scientists realized the files belonging to each user needed to be isolated and protected. A user should be able to see and amend their own files, but they shouldn’t be allowed to see files belonging to someone else.

The solution meant users had to be identified. They needed a user name. And to prove the user was who they said they were, the password was invented. The credit for the invention of the password goes to Fernando J. Corbató.

Advertisement

The trouble with passwords is anyone who knows your password can access your account. It’s like giving them a spare key to your house. Two-factor authentication (2FA) improves this situation. It combines something you know—your password—with something you own—typically your smartphone. When you enter your password into a system with 2FA, a code is sent to your smartphone. You need to enter that code into the computer, too. But 2FA doesn’t replace the password, it augments the security model of the standard password.

Biometrics are being introduced in some systems, too. This combines a unique biological identifier, something you are, into the mix, such as a fingerprint or facial recognition. This pushes beyond two-factor authentication and into multifactor authentication. These newer technologies will not filter through to the majority of computer systems and online services for many more decades, and probably never will arrive in some systems. The password is going to be with us for a long time.

Data Breaches

Data breaches are happening incessantly. The data from these breaches eventually arrives on the dark web where it is sold to other cybercriminals. It can be used in scam emails, phishing emails, different types of fraud and identity theft, and to access other systems. Credential stuffing attacks use automated software to try to log in to systems. These databases of emails and passwords provide the ammunition for those attacks.

People have a bad habit of reusing passwords. Instead of having a unique robust password per system, they often reuse a single password again and again on multiple systems.

It only takes one of those sites to be compromised for all of the other sites to be at risk. Instead of the threat actors knowing your password to the breached site—which you will change as soon as you hear there’s been a breach—they can use that email and password to access your other accounts.

10 Billion Breached Accounts

The Have I Been Pwned website collects the data sets from all the data breaches it can. You can search all of that combined data and see whether your email address has been exposed in a breach. If it has, Have I Been Pwned tells you which site or service the data came from. You can then go to that site and change your password or close your account. And if you’ve used the password you used on that site on any other sites, you need to go and change it on sites, too.

Advertisement

There are currently over 10 billion data records in the Have I Been Pwned database. What are the chances one or more of your email addresses are in there? Perhaps a better question would be what are the odds that your email address isn’t in there?

Searching for an Email Address

Checking is easy. Go to the Have I Been Pwned website, and enter your email address into the “Email address” field, and click the “Pwned?” button.

I entered an old email address and found it had been included in six data breaches.

The important points to note are:

Domain Searches

As illuminating and useful as this is, entering the email addresses for all your staff will be time-consuming. Have I Been Pwned’s answer to this is the domain search function. You can register your domain and obtain a report covering any and all email addresses on that domain that have been found in breaches.

Advertisement

And if any email addresses on your domain appear in future breaches, you’ll be notified. That’s pretty cool.

You have to prove ownership of the domain, of course. There are different ways to achieve this. You can:

This is a great free service and well worth the few moments it takes to register.

Searching for Unrelated Emails

But what if you have a rag-tag collection of emails to check, scattered across different domains? You might have email addresses for gmail.com,  and other domains that you’re obviously not going to be able to prove ownership of.

Here’s a Linux shell script that takes a text file as a command-line parameter. The text file should contain email addresses, one per line. The script performs a Have I Been Pwned email search for each email address in the text file.

Advertisement

The script makes use of an authenticated API. You’re going to need an API key. To get a key, you need to register and pay for the service. Troy Hunt has written a thorough blog post on the topic of charging for the use of the API. He explains with complete candour why he was forced to charge as a way to combat API abuse. The cost is USD 3.50 per month, which is less than a coffee from a high street outlet. You can pay for one month, or you can subscribe for a year.

Here’s the entire script.

#!/bin/bash

if [[ $# -ne 1 ]]; then
  echo "Usage:" $0 "file-containing-email-addresses"
  exit 1
fi

for email in $(cat $1)
do
  echo $email

  curl -s -A "CloudSavvyIT" \
  -H "hibp-api-key:your-API-key-goes-here" \
  https://haveibeenpwned.com/api/v3/breachedaccount/$email?truncateResponse=false \
  | jq -j '.[] | " ", .Title, " [", .Name, "] ", .BreachDate, "\n"'

  echo "---"
  sleep 1.6
done

exit 0

Before we explain how the script works, you might have noticed it makes use of curl and jq. If you don’t have these installed on your computer, you’ll need to add them.

On Ubuntu, the commands are:

sudo apt-get install curl

sudo apt-get install jq

On Fedora, you need to type:

sudo dnf install curl

sudo dnf install jq

On Manjaro, you’ll use pacman:

sudo pacman -Syu curl

sudo pacman -Syu jq

RELATED: How to Use curl to Download Files From the Linux Command Line

How the Script Works

The variable $# holds the number of command-line parameters that were passed to the script. If this does not equal one, the usage message is displayed and the script exits. The variable $0 holds the name of the script.

if [[ $# -ne 1 ]]; then 

  echo "Usage:" $0 "file-containing-email-addresses" 

  exit 1

fi

The script reads the email addresses from the text file using cat, and sets $email to hold the name of the email address currently being processed.

for email in $(cat $1) 
do
  echo $email

The curl command is used to access the API and to retrieve the result. The options we’re using with it are:

Advertisement

The curl command sends the request to the Have I Been Pwned breached account API URL. The response is piped into jq.

jq extracts the title ( .Title ) of the breach, the internal identifier ( .Name ) for the breach, and the date of the breach ( .BreachDate ) from the unnamed array ( .[] ) holding the JSON information.

  curl -s -A "CloudSavvyIT" \
  -H "hibp-api-key:your-API-key-goes-here" \
  https://haveibeenpwned.com/api/v3/breachedaccount/$email?truncateResponse=false \
  | jq -j '.[] | "  ", .Title, " [", .Name, "] ", .BreachDate, "\n"' 

  echo "---" 
  sleep 1.6 
done

exit 0

A couple of spaces are displayed before the breach title to indent the output. This makes it easier to differentiate between email addresses and breach names. Brackets have been placed on either side of the .Name data item to help with visual parsing. These are simple cosmetics and can be changed or removed, to suit your needs.

Three dashes are displayed to separate the data for each email address, and a pause of 1.6 seconds is added between checks. This is required to avoid bombarding the API too frequently and getting temporarily blocked.

There are 15 data items that you could choose to have displayed. The full list is shown on the API pages of the website.

RELATED: How to Parse JSON Files on the Linux Command Line with jq

Running the Script

Copy the whole script into an editor, replace your-API-key-goes-here with your API key, then save it as “pwnchk.sh.” To make it executable, run this command:

chmod +x pwnchk.sh

We have a text file called “email-list.txt.” it contains these email addresses:

Advertisement

That’s the president and vice president of the United States, and the private office of the prime minister of the United Kingdom. They’re all publicly available email addresses, so we’re not breaking any privacy or security protocols using them here. For convenience, we’re piping the output into less. You could just as easily redirect the output to a file.

./pwnchk.sh email-list.txt | less

The first line mentions “2,844 Separate Data Breaches.”

That’s the name of a collection of breached data made up of 2,844 smaller breaches. It doesn’t mean that email address has been in that many breaches.

Scroll through the output, and you’ll see that those email addresses have been found in multiple breaches dating all the way back to a Myspace breach of 2008.

A Final Word on Passwords

You can also search for passwords on Have I been Pwned. If a match is found it doesn’t necessarily mean that password in the data breach is yours. What it probably means is your password is not unique.

Advertisement

The weaker your password is, the less likely it will be unique. For example, the favorite password of the lazy user, 123456, had 23.5 million matches. That’s why searching by email is the better option.

Always use robust unique passwords. Use a password manager if you have too many passwords to remember. Where 2FA is offered, use it.

The script we’ve presented will help you to check a disparate list of email addresses. It’ll save you a bunch of time, especially if it is something you’re going to run periodically.

Dave McKay
Dave McKay first used computers in the industry when punched paper tape was in vogue and he has been programming ever since. His use of computers pre-dates the birth of the PC and the public release of Unix. He has programmed in everything from 6502 assembly to Lisp, and from Forth to C#. He is now a technology journalist and independent Data Protection and Compliance consultant. Read Full Bio »