Cached and Confused: Web Cache Deception in the Wild

Web cache deception (WCD) is an attack proposed in 2017, where an attacker tricks a caching proxy into erroneously storing private information transmitted over the Internet and subsequently gains unauthorized access to that cached data. Due to the widespread use of web caches and, in particular, the use of massive networks of caching proxies deployed by content distribution network (CDN) providers as a critical component of the Internet, WCD puts a substantial population of Internet users at risk.

We present the first large-scale study that quantifies the prevalence of WCD in 340 high-profile sites among the Alexa Top 5K. Our analysis reveals WCD vulnerabilities that leak private user data as well as secret authentication and authorization tokens that can be leveraged by an attacker to mount damaging web application attacks. Furthermore, we explore WCD in a scientific framework as an instance of the path confusion class of attacks, and demonstrate that variations on the path confusion technique used make it possible to exploit sites that are otherwise not impacted by the original attack. Our findings show that many popular sites remain vulnerable two years after the public disclosure of WCD.

Our empirical experiments with popular CDN providers underline the fact that web caches are not plug & play technologies. In order to mitigate WCD, site operators must adopt a holistic view of their web infrastructure and carefully configure cache settings appropriate for their applications.

USENIX Security Symposium, 2020

(PortSwigger‘s Top Web Hacking Technique of 2019)

HotFuzz: Discovering Algorithmic Denial-of-Service Vulnerabilities Through Guided Micro-Fuzzing

Contemporary fuzz testing techniques focus on identifying memory corruption vulnerabilities that allow adversaries to achieve either remote code execution or information disclosure. Meanwhile, Algorithmic Complexity (AC)vulnerabilities, which are a common attack vector for denial-of-service attacks, remain an understudied threat. In this paper, we present HotFuzz, a framework for automatically discovering AC vulnerabilities in Java libraries. HotFuzz uses micro-fuzzing, a genetic algorithm that evolves arbitrary Java objects in order to trigger the worst-case performance for a method under test. We define Small Recursive Instantiation (SRI) as a technique to derive seed inputs represented as Java objects to micro-fuzzing. After micro-fuzzing, HotFuzz synthesizes test cases that triggered AC vulnerabilities into Java programs and monitors their execution in order to reproduce vulnerabilities outside the fuzzing framework. HotFuzz outputs those programs that exhibit high CPU utilization as witnesses for AC vulnerabilities in a Java library. We evaluate HotFuzz over the Java Runtime Environment (JRE), the 100 most popular Java libraries on Maven, and challenges contained in the DARPA Space and Time Analysis for Cybersecurity (STAC) program. We evaluate SRI’s effectiveness by comparing the performance of micro-fuzzing with SRI, measured by the number of AC vulnerabilities detected, to simply using empty values as seed inputs. In this evaluation, we verified known AC vulnerabilities, discovered previously unknown AC vulnerabilities that we responsibly reported to vendors, and received confirmation from both IBM and Oracle. Our results demonstrate that micro-fuzzing finds AC vulnerabilities in real-world software, and that micro-fuzzing with SRI-derived seed inputs outperforms using empty values.

Network and Distributed System Security Symposium (NDSS), 2020

A Longitudinal Analysis of the ads.txt Standard

Programmatic advertising provides digital ad buyers with the convenience of purchasing ad impressions through Real Time Bidding (RTB) auctions. However, programmatic advertising has also given rise to a novel form of ad fraud known as domain spoofing, in which attackers sell counterfeit impressions that claim to be from high-value publishers. To mitigate domain spoofing, the Interactive Advertising Bureau (IAB) Tech Lab introduced the ads.txt standard in May 2017 to help ad buyers verify authorized digital ad sellers, as well as to promote overall transparency in programmatic advertising.

In this work, we present a 15-month longitudinal, observational study of the ads.txt standard. We do this to understand (1) if it is helping ad buyers to combat domain spoofing and (2) whether the transparency offered by the standard can provide useful data to researchers and privacy advocates.

With respect to halting domain spoofing, we observe that over 60% of Alexa Top-100K publishers have adopted ads.txt, and that ad exchanges and advertisers appear to be honoring the standard. With respect to transparency, the widespread adoption of ads.txt allows us to explicitly identify over 1,000 domains belonging to ad exchanges, without having to rely on crowdsourcing or heuristic methods.

However, we also find that ads.txt is still a long way from reaching its full potential. Many publishers have yet to adopt the standard, and we observe major ad exchanges purchasing unauthorized impressions that violate the standard. This opens the door to domain spoofing attacks. Further, ads.txt data often includes errors that must be cleaned and mitigated before the data is practically useful.

ACM Internet Measurement Conference (IMC), 2019

Understanding and Mitigating the Security Risks of Content Inclusion in Web Browsers

Thanks to the wide range of features offered by web browsers, modern websites include various types of content such as JavaScript and CSS in order to create interactive user interfaces. Browser vendors also provided extensions to enhance web browsers with additional useful capabilities that are not necessarily maintained or supported by default.

However, included content can introduce security risks to users of these websites, unbeknownst to both website operators and users. In addition, the browser’s interpretation of the resource URLs may be very different from how the web server resolves the URL to determine which resource should be returned to the browser. The URL may not correspond to an actual server-side file system structure at all, or the web server may internally rewrite parts of the URL. This semantic disconnect between web browsers and web servers in interpreting relative paths (path confusion) could be exploited by Relative Path Overwrite (RPO). On the other hand, even tough extensions provide useful additional functionality for web browsers, they are also an increasingly popular vector for attacks. Due to the high degree of privilege extensions can hold, extensions have been abused to inject advertisements into web pages that divert revenue from content publishers and potentially expose users to malware.

In this thesis, I propose novel research into understanding and mitigating the security risks of content inclusion in web browsers to protect website publishers as well as their users. First, I introduce an in-browser approach called Excision to automatically detect and block malicious third-party content inclusions as web pages are loaded into the user’s browser or during the execution of browser extensions. Then, I propose OriginTracer, an in-browser approach to highlight extension-based content modification of web pages. Finally, I present the first in-depth study of style injection vulnerability using RPO and discuss potential countermeasures.

Doctor of Philosophy (PhD) Thesis, 2019

On the Effectiveness of Type-based Control Flow Integrity

Control flow integrity (CFI) has received significant attention in the community to combat control hijacking attacks in the presence of memory corruption vulnerabilities. The challenges in creating a practical CFI has resulted in the development of a new type of CFI based on runtime type checking (RTC). RTC-based CFI has been implemented in a number of recent practical efforts such as GRSecurity Reuse Attack Protector (RAP) and LLVM-CFI. While there has been a number of previous efforts that studied the strengths and limitations of other types of CFI techniques, little has been done to evaluate the RTC-based CFI.

In this work, we study the effectiveness of RTC from the security and practicality aspects. From the security perspective, we observe that type collisions are abundant in sufficiently large code bases but exploiting them to build a functional attack is not straightforward. Then we show how an attacker can successfully bypass RTC techniques using a variant of ROP attacks that respect type checking (called TROP) and also built two proof-of-concept exploits, one against Nginx web server and the other against Exim mail server. We also discuss practical challenges of implementing RTC. Our findings suggest that while RTC is more practical for applying CFI to large code bases, its policy is not strong enough when facing a motivated attacker.

Annual Computer Security Applications Conference (ACSAC), 2018

How Tracking Companies Circumvented Ad Blockers Using WebSockets

In this study of 100,000 websites, we document how Advertising and Analytics (A&A) companies have used WebSockets to bypass ad blocking, exfiltrate user tracking data, and deliver advertisements. Specifically, our measurements investigate how a long-standing bug in Chrome’s (the world’s most popular browser) chrome.webRequest API prevented blocking extensions from being able to interpose on WebSocket connections.

We conducted large-scale crawls of top publishers before and after this bug was patched in April 2017 to examine which A&A companies were using WebSockets, what information was being transferred, and whether companies altered their behavior after the patch. We find that a small but persistent group of A&A companies use WebSockets, and that several of them engaged in troubling behavior, such as browser fingerprinting, exfiltrating the DOM, and serving advertisements, that would have circumvented blocking due to the Chrome bug.

ACM Internet Measurement Conference (IMC), 2018

How Tracking Companies Circumvent Ad Blockers Using WebSockets

In this study of 100,000 websites, we document how Advertising and Analytics (A&A) companies have used WebSockets to bypass ad blocking, exfiltrate user tracking data, and deliver advertisements. Specifically, we leverage a longstanding bug in Chrome (the world’s most popular browser) in the chrome.webRequest API that prevented blocking extensions from being able to interpose on WebSocket connections.

We conducted large-scale crawls of top publishers before and after this bug was patched in April 2017 to examine which A&A companies were using WebSockets, what information was being transferred, and whether companies altered their behavior after the patch. We find that a small but persistent group of A&A companies use WebSockets, and that several of them are engaging in troubling behavior, such as browser fingerprinting, exfiltrating the DOM, and serving advertisements, that would have circumvented blocking due to the Chrome bug.

IEEE S&P Workshop on Technology and Consumer Protection (ConPro), 2018

Large-Scale Analysis of Style Injection by Relative Path Overwrite

 

Relative Path Overwrite (RPO) is a recent technique to inject style directives into websites even when no style sink or markup injection vulnerability is present. It exploits differences in how browsers and web servers interpret relative paths (i.e., path confusion) to make a HTML page reference itself as a stylesheet; a simple text injection vulnerability along with browsers’ leniency in parsing CSS resources results in an attacker’s ability to inject style directives that will be interpreted by the browser. Even though style injection may appear less serious a threat than script injection, it has been shown that it enables a range of attacks, including secret exfiltration.

In this paper, we present the first large-scale study of the Web to measure the prevalence and significance of style injection using RPO. Our work shows that around 9 % of the websites in the Alexa Top 10,000 contain at least one vulnerable page, out of which more than one third can be exploited. We analyze in detail various impediments to successful exploitation, and make recommendations for remediation. In contrast to script injection, relatively simple countermeasures exist to mitigate style injection. However, there appears to be little awareness of this attack vector as evidenced by a range of popular Content Management Systems (CMSes) that we found to be exploitable.

The Web Conference (WWW), 2018

(Honorable Mention)

Thou Shalt Not Depend on Me: Analysing the Use of Outdated JavaScript Libraries on the Web

Web developers routinely rely on third-party JavaScript libraries such as jQuery to enhance the functionality of their sites. However, if not properly maintained, such dependencies can create attack vectors allowing a site to be compromised.

In this paper, we conduct the first comprehensive study of client-side JavaScript library usage and the resulting security implications across the Web. Using data from over 133 k websites, we show that 37% of them include at least one library with a known vulnerability; the time lag behind the newest release of a library is measured in the order of years. In order to better understand why websites use so many vulnerable or outdated libraries, we track causal inclusion relationships and quantify different scenarios. We observe sites including libraries in ad hoc and often transitive ways, which can lead to different versions of the same library being loaded into the same document at the same time. Furthermore, we find that libraries included transitively, or via ad and tracking code, are more likely to be vulnerable. This demonstrates that not only website administrators, but also the dynamic architecture and developers of third-party services are to blame for the Web’s poor state of library management.

The results of our work underline the need for more thorough approaches to dependency management, code maintenance and third-party code inclusion on the Web.

Network and Distributed System Security Symposium (NDSS), 2017

“Recommended For You”: A First Look at Content Recommendation Networks

One advertising format that has grown significantly in recent years are known as Content Recommendation Networks (CRNs). CRNs are responsible for the widgets full of links that appear under headlines like “Recommended For You” and “Things You Might Like”. Although CRNs have become quite popular with publishers, users complain about the low-quality of content promoted by CRNs, while regulators in the US and Europe have faulted CRNs for failing to label sponsored links as advertisements.

In this study, we present a first look at five of the largest CRNs, including their footprint on the web, how their recommendations are labeled, and who their advertisers are. Our findings reveal that CRNs still fail to prominently disclose the paid nature of their sponsored content. This suggests that additional intervention is necessary to promote accepted best-practices in the nascent CRN marketplace, and ultimately protect online users.

ACM Internet Measurement Conference (IMC), 2016