Have I/you been pwned? There&#39;s now an API for that...

troyhunt.com - Have I been pwned? You can now ask the API!

I got a lot of requests after launching HIBP for an API and I saw some great ideas come up in terms of how it might be used for very constructive purposes. Truth be told, there was an API from day one insofar as this was precisely what the web UI was hitting every time you searched for an email address anyway, I just hadn’t published any docs on it or promoted its existence.

That said, I did give it a bit of tweaking to make it more “RESTful” (this, apparently, is what all APIs must be these days) and it works like this:

...

There’s also CORS support so you can happily hit the API directly from within another web app on a different domain. It’s all documented on the HIBP site.

That is all.

There is no authentication.

There is no rate limiting.

There is no cost.

Those decisions may turn out to be insightful in that it means it’s exceptionally easy to use and doesn’t place any unnecessary barriers in front of people, or it may be naive and it’ll be abused no end in ways I haven’t even begun to consider. Or both. On the abuse side though, seriously, if you want a big pile of email addresses then go and download Adobe and the others, they’re dead easy to find and it’s a heap easier than enumerating through addresses one by one over HTTP in the hope of getting a hit.

I’ve made the API available because it was easy to do and I’ve made it freely available as it shouldn’t have any cost impact. The compute resources required are tiny and the egress data is measures in bytes – it’s a very efficient process even though it’s searching through 154M records.

Finally, on the structure of the API, I did toss up whether to implement in what is theoretically the more RESTful approach you above (the email address in the path implies a resource) as opposed to a more query-centric approach by passing a value such as email={email}. I asked the question on Twitter and saw vigorous debate arguing the merits of each approach. I’ve published the one described above, but it’s still accessible via query string as well (I haven’t changed the way the search feature on the website uses this). Do feel free to add your thoughts about this or other aspects in the comments below, I’m sure this is but the first phase of many enhancements to come.

I’ll ask one favour from those of you make good use of it – tell me[Tony] about it.If you can share it publicly then leave a comment here, if you want to share it privately then send me an email. ...

Introducing “Have I been pwned?” – aggregating accounts across website breaches

I often write up analyses of the passwords disclosed in website breaches. For example, there was A brief Sony password analysis back in mid-2011 and then our local Aussie ABC earlier this year where I talked about Lousy ABC cryptography cracked in seconds as Aussie passwords are exposed. I wrote a number of other pieces looking specifically at the nature of the data exposed in individual sites, but what I really found interesting was when I started comparing breaches.

In the middle of last year I wrote What do Sony and Yahoo! have in common? Passwords! and found that 59% of people with accounts in both sources used the same password. Then just last month when I wrote about “the mother of all breaches” in Adobe credentials and the serious insecurity of password hints, I found that many of the accounts from the Sony breach were also in Adobe’s. In that case I explained how this put personal information at serious risk as the unencrypted password hints in Adobe’s breach often had the answers in the unencrypted Sony passwords!

As I analysed various breaches I kept finding user accounts that were also disclosed in other attacks – people were having their accounts pwned over and over again. So I built this:

The site is now up and public at haveibeenpwned.com so let me share what it’s all about.

About HIBP...

Working with 154 million records on Azure Table Storage – the story of “Have I been pwned?”

I’m one of these people that must learn by doing. Yes, I’m sure all those demos look very flashy and the code appears awesome, but unless I can do it myself then I have trouble really buying into it. And I really want to buy into Azure because frankly, it’s freakin’ awesome.

This is not a “yeah but you’re an MVP so you’ve gotta say that / you’re predispositioned to say that / you’re getting kickbacks from Ballmer”. I don’t, I’m not and I wish!

As many of you will know by now, yesterday I launched Have I been pwned? (HIBP) which as I briefly mentioned in that blog post, runs on Windows Azure. Now I’ve run stuff on Azure before, but it’s usually been the classic website and database model translated to the Azure paradigm rather than using the innovative cloud services that Azure does well.

When I came to build HIBP, I had a challenge: How do I make querying 154 million email addresses as fast as possible? Doing just about anything with the data in SQL Server was painfully slow to the extent that I ended up creating a 56GB of RAM Windows Azure SQL Server VM just to analyse it in order to prepare the info for the post I wrote on the insecurity of password hints. Plus, of course, the data will grow – more pwning of sites will happen and sooner or later there’ll be another “Adobe” and we’ll be looking at 300M records that need to be queried.

The answer was Azure Table Storage and as it turns out, it totally rocks.

Azure table storage – the good, the bad and the awesome ...

Please make sure you click though and read the full articles. Tony goes into much more details and provides some great info, in an easy to grok format and style.