Our paper, as titled, has been accepted by DIMVA 2015 – Milano, Italy. While the final paper will not be released until July, we will have a brief summary of what we have done in this post. Another focus here will be the implementation of the web crawler we have designed for OAuth Detection (OAD), which is open-source now. K.R.K.C.
0. OAuth 2.0
While there is an argument that OAuth 1.0 is gracefully designed (and more secure) comparing to OAuth 2.0, the later has become the dominant deployment. All these flagship IT companies, such as Google, Facebook and Twitter, provide OAuth 2.0 authentication services for third-party applications, which allows the user to use the services without exposing the personal information (e.g., password) to the application itself or registering each service each time. For example, we can use Facebook account to login Spotify without exposing the password to Spotify or registering Spotify.
1. Cross-Site Request Forgery (CSRF)
“A CSRF is a common form of confused deputy attack where a users’s previous session data is used by an attacker to make a malicious request on behalf of the victim.” [our paper] The most common countermeasure for this attack is to include a random token within URL/request to implement the challenge-response mechanism. Like other HTTP/HTTPS based protocols, OAuth 2.0 is also vulnerable to CSRF attacks. However, the good thing here is that this attack is well documented and discussed in the OAuth 2.0 RFC. According to the RFC, a “state” parameter is recommended as the implementation of the random value and should be added into the authentication request URL. As long as the OAuth 2.0 implementation has such “state” parameter considered correctly, CSRF attacks should be defended.
2. So, what is wrong here?
In this paper, we are interested in what happens in the real-world OAuth 2.0 implementation and deployment. Specially, we are interested in the CSRF vulnerabilities in the wild. We analyzed Alexa Top 10,000 domains using our OAD crawler, which crawled and analyzed more than 5 million URLs in total. Surprisingly (or maybe not), 25% of websites using OAuth 2.0 appear vulnerable to CSRF attacks. The root cause is simple that either the Identity Provider (IdP, providing OAuth authentication services) does not force the “state” parameter or the application developer does not implement it at all. This again reminds me the giant gap between the theory and practice – In theory, there is no big difference between theory and practice; in practice, there is!
3. OAuth Detector (OAD)
While more details could be found in the paper, here we focus on the design and implementation of our web crawler – OAD. However, Scrapy may be the most popular and convenient Python web crawler framework one can think of. Personally, I do support using Scrapy for fast and common web crawler implementations. At least, this is what it is designed for. Our concerns here regarding some existing general web crawlers, such as Scrapy are:
a. They seem to be heavy-weighted.
b. What if we need low-level control/debugging, such as threading related?
c. What about the performance considering a long run?