Security Xss Prevention

security-xss

// Security - Cross Site Scripting - Prevention:

To Prevent 'Cross-Site Scripting (XSS):

1. The preferred option is to properly escape all untrusted data based on the 
   HTML context (body, attribute, JavaScript, CSS, or URL) that the data will be 
   placed into. 

2. Positive or “whitelist” server-side input validation is also recommended as 
   it helps protect against XSS, but is not a complete defense as many 
   applications require special characters in their input. Such validation 
   should, as much as possible, validate the length, characters, format, and 
   business rules on that data before accepting the input.

3. For rich content, consider auto-sanitization libraries like OWASP’s AntiSamy 
   or the Java HTML Sanitizer Project:
   https://www.owasp.org/index.php/AntiSamy
   https://www.owasp.org/index.php/OWASP_Java_HTML_Sanitizer_Project

4. Consider Content Security Policy (CSP)

Because XSS is caused weak seperation between code context and user data, the 
protections we will describe all have a common theme: they strengthen the  
barrier between code context and user data. The two basic approaches to defense 
are output encoding, and input filtering.

Input filtering works on the idea that malicious attacks are best caught at the 
point of user input. If the user inputs "<b>duck</b>", and the page strips out 
the code, no unauthorized code will be able to run. There are two types of input 
filtering:

1. Blacklisting - specific bad characters or combinations of characters are 
   banned, preventing their entry or storage.

2. Whitelisting - only characters or words from a known list of good entries 
   are permitted, preventing any malicious input.

Of the 2 choices, whitelist filtering is considered stronger. Whitelisting only 
requires knowedge of good entries, while blacklisting may require knowledge of 
all potential bad entires which is often an impossible task.

Output encoding: Browsers have faced the problem of code context and data 
context for a very long time. The "<" character may have meaning in HTML, but it 
is also used in mathematical functions and browsers need a way to tell the 
difference. Encoding allows a character of specific meaning to be included in a 
block of text without that meaning via some controlled substitution. In the case 
of the "<" character, you can substitute it with the characters "&lt;" and your 
browser will understand that you desire the text version of "<", not the HTML 
version.

Output encoding is the process by which a server will take all the meaningful 
characters for a specific context (HTML, javaScript, URL) and substitute them 
with the characters that represent it's text version. This is an effective way 
to mitigate XSS, because characters which can act as code will never be 
represented in their meaningful version inside a block of code (only their 
text equivalent).

While input filtering techniques work by preventing malicious data from entering 
the system, output encoding techniques take the opposite approach.  They 
prevents malicous payloads already in the system from executing.

Output encoding is often considered more necessary than input encoding because 
it does not rely on any upstream or downstream protections, and cannot be 
bypassed by alternative input pathways.

<script>
&#x61;lert(1);
</script>

Some papers or guides advocate using innerText as an alternative to innerHTML 
to mitigate against XSS in innerHTML. However, depending on the tag which 
innerText is applied, code can be executed.

<script>
var tag = document.createElement(“script”);
tag.innerText = “<%=untrustedData%>”;  //executes code
</script>

When possible, set the HttpOnly attribute on your cookies. This flag tells the 
browser to reveal the cookie only over HTTP or HTTPS connections, but to have 
document.cookie evaluate to a blank string when JavaScript code tries to read 
it. (Some browsers do still let JavaScript code overwrite or append to 
document.cookie, however.) If your application does require the ability for 
JavaScript to read the cookie, then you won’t be able to set HttpOnly. 
Otherwise, you might as well set this flag.

Note that HttpOnly is not a defense against XSS, it is only a way to briefly 
slow down attackers exploiting XSS with the simplest possible attack payloads. 
It is not a bug or vulnerability for the HttpOnly flag to be absent.

Stored XSS Resulting from Arbitrary User Uploaded Content:

Applications such as Content Management, Email Marketing, etc. may need to allow 
legitimate users to create and/or upload custom HTML, Javascript or files. This 
feature could be misused to launch XSS attacks. For instance, a lower privileged 
user could attack an administrator by creating a malicious HTML file that steals 
session cookies. The recommended protection is to serve such arbitrary content 
from a separate domain outside of the session cookie's scope.

Let’s say cookies are scoped to https://app.site.com. Even if customers can 
upload arbitrary content, you can always serve the content from an alternate 
domain that is outside of the scoping of any trusted cookies (session cookies 
and other sensitive information). As an example, pages on https://app.site.com 
would reference customer-uploaded HTML templates as IFRAMES using a link to

https://content.site.com/cust1/templates?templId=13&auth=someRandomAuthenticationToken

The authentication token would substitute for the session cookie since sessions 
scoped to app.site.com would not be sent to content.site.com. If the data being 
stored is sensitive, a one time use or short lived token should be used. 

HTTP Response Splitting:

HTTP response splitting is a vulnerability closely related to XSS, and for which 
the same defensive strategies apply. Response splitting occurs when user data is 
inserted into an HTTP header returned to the client. Instead of inserting 
malicious script, the attack is to insert additional newline characters. Because 
headers and the response body are delimited by newlines in HTTP, this allows the 
attacker to insert their own headers and even construct their own page body (
which might have an XSS payload inside). To prevent HTTP response splitting, 
filter ‘\n’ and ‘\r’ from any output used in an HTTP header.

Validation frameworks:
http://static.springsource.org/spring/docs/2.0.8/reference/validation.html
http://oval.sourceforge.net/

Output Filtering and Encoding:

JSTL tags such as <c:out> have the excapeXml attribute set to true by default. 
This default behavior ensures that HTML special characters are entity-encoded 
and prevents many XSS attacks. If any tags in your application set 
escapeXml="false" (such as for outputting the Japanese yen symbol) you need to 
apply some other escaping strategy. For JSF, the tag attribute is escape, and is 
also set to true by default for <h:outputText> and <h:outputFormat>.

Other page generation systems do not always escape output by default. Freemarker 
is one example. All application data included in a Freemarker template should be 
surrounded with an <#escape> directive to do output encoding (e.g. <#escape 
x as x?html>) or by manually adding ?html (or ?js_string for JavaScript 
contexts) to each expression (e.g. ${username?html}).

Custom JSP tags or direct inclusion of user data variables with JSP expressions 
(e.g. <%= request.getHeader("HTTP_REFERER") %>) or scriptlets (e.g. <% 
out.println(request.getHeader("HTTP_REFERER") %>) should be avoided.

As URI Encoding is only defined on Asci codes 0-255, when higher order code 
points need to be encoded, they are first transformed into a sequence of UTF-8 
bytes and then each byte is URI Encoded.

Be aware that javascript contains three built in URI encoding and decoding 
functions, none of which are suitable for security encoding:

1. escape(), unescape() have been deprecated because of improper UTF-8 handling.

2. encodeURI() and decodeURI() is designed to allow URIs with some illegal 
   characters to be converted to legal URIs. These functions do not encode URI 
   control characters such as "://" or ".".

3. encodeURIComponent() and decodeURIComponent() are designed to encode all URI 
   control characters but do not encode all characters such as the single quote.

https://search.maven.org/remotecontent?filepath=org/owasp/encoder/
encoder/1.2/encoder-1.2.jar
http://search.maven.org/remotecontent?filepath=org/owasp/encoder/
encoder/1.2/encoder-1.2-javadoc.jar

To get started, simply add the encoder-1.2.jar, import org.owasp.encoder.Encode 
and start encoding.

Basic HTML Context:
<body><%= Encode.forHtml(UNTRUSTED) %></body>

HTML Content Context:
<textarea name="text"><%= Encode.forHtmlContent(UNTRUSTED) %></textarea>

HTML Attribute context:
<input type="text" name="address" 
  value="<%= Encode.forHtmlAttribute(UNTRUSTED) %>" />

CSS contexts:
<div style="width:<= Encode.forCssString(UNTRUSTED) %>">
<div style="background:<= Encode.forCssUrl(UNTRUSTED) %>">

Javascript Block context:
<script type="text/javascript">
var msg = "<%= Encode.forJavaScriptBlock(UNTRUSTED) %>";
alert(msg);
</script>

Javascript Variable context:
<button 
onclick="alert('<%= Encode.forJavaScriptAttribute(UNTRUSTED) %>');">
click me</button>

Encode URL parameter values:
<a href="/search?value=<%= Encode.forUriComponent(UNTRUSTED) %>&order=1#top">

When handling a full url with the OWASP Java encoder, first verify the URL is a 
legal URL:

String url = validateURL(untrustedInput);

Then encode the URL as an HTML attribute when outputting to the page. Note the 
linkable text needs to be encoded in a different context.Then encode the URL as 
an HTML attribute when outputting to the page.

<a href="<%= Encode.forHtmlAttribute(untrustedUrl) %>">
<%= Encode.forHtmlContent(untrustedLinkName) %>
</a>

To use Java ESAPI inside a JSP:

<%@page contentType="text/html" pageEncoding="UTF-8"%>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
   "http://www.w3.org/TR/html4/loose.dtd">
<%@taglib prefix="e" uri="https://www.owasp.org/index.php/
  OWASP_Java_Encoder_Project" %>
<html>
   <head>
       <title><e:forHtml value="${param.title}" /></title>
   </head>
   <body>
       <h1>${e:forHtml(param.data)}</h1>
   </body>
</html>

String safe = ESAPI.encoder().encodeForHTML( request.getParameter( "input" ) );
String safe = ESAPI.encoder().encodeForHTMLAttribute( 
  request.getParameter( "input" ) 
);
String safe = ESAPI.encoder().encodeForJavaScript(
  request.getParameter( "input" )
);
String safe = ESAPI.encoder().encodeForURL( request.getParameter( "input" ) );

<script id="init_data" type="application/json">
  <%= html_escape(data.to_json) %>
</script>

// external js file
var dataElement = document.getElementById('init_data');
// unescape the content of the span
var jsonText = dataElement.textContent || dataElement.innerText  
var initData = JSON.parse(html_unescape(jsonText));

An alternative to escaping and unescaping JSON directly in JavaScript, is to 
normalize JSON server-side by converting '<' to '\u003c' before delivering it 
to the browser.

When you want to put untrusted data into a stylesheet or a style tag. CSS is 
surprisingly powerful, and can be used for numerous attacks. Therefore, it's 
important that you only use untrusted data in a property value and not into 
other places in style data. You should stay away from putting untrusted data 
into complex properties like url, behavior, and custom (-moz-binding). You 
should also not put untrusted data into IE’s expression property value which 
allows JavaScript.

Some CSS contexts that can never safely use untrusted data as input - EVEN IF 
PROPERLY CSS ESCAPED! You will have to ensure that URLs only start with "http" 
not "javascript" and that properties never start with "expression".

{ background-url : "javascript:alert(1)"; }  // and all other URLs
{ text-size: "expression(alert('XSS'))"; }   // only in IE

Except for alphanumeric characters, escape all characters with ASCII values less 
than 256 with the \HH escaping format. DO NOT use any escaping shortcuts like \" 
because the quote character may be matched by the HTML attribute parser which 
runs first. These escaping shortcuts are also susceptible to "escape-the-escape" 
attacks where the attacker sends \" and the vulnerable code turns that into \\" 
which enables the quote.

If attribute is quoted, breaking out requires the corresponding quote. All 
attributes should be quoted but your encoding should be strong enough to prevent 
XSS when untrusted data is placed in unquoted contexts. Unquoted attributes can 
be broken out of with many characters including [space] % * + , - / ; < = > ^ 
and |. Also, the </style> tag will close the style block even though it is 
inside a quoted string because the HTML parser runs before the JavaScript 
parser. Please note that we recommend aggressive CSS encoding and validation to 
prevent XSS attacks for both quoted and unquoted attributes.

Do not encode complete or relative URL's with URL encoding! If untrusted input 
is meant to be placed into href, src or other URL-based attributes, it should be 
validated to make sure it does not point to an unexpected protocol, especially 
Javascript links. URL's should then be encoded based on the context of display 
like any other piece of data.

To handle a full untrusted URL within an html anchor tag attribute using OWASP 
Java Encoder 1.5 project, we must first validate the URL and then perform
Encode.forHtmlAttribute(untrustedURL)

String userURL = request.getParameter( "userURL" )
boolean isValidURL = ESAPI.validator().
  isValidInput("URLContext", userURL, "URL", 255, false); 
if (isValidURL) {  
  <a href="<%=encoder.encodeForHTMLAttribute(userURL)%>">link</a>
}

There are numerous methods which implicitly eval() data passed to it. Make sure 
that any untrusted data passed to these methods is delimited with string 
delimiters and enclosed within a closure or JavaScript encoded to N-levels 
based on usage, and wrapped in a custom function. Ensure to follow step 4 above 
to make sure that the untrusted data is not sent to dangerous methods within the 
custom function or handle it by adding an extra layer of encoding.

Utilizing an Enclosure:

The example that follows illustrates using closures to avoid double JavaScript 
encoding.

setTimeout((function(param) { return function() {
         customFunction(param);
         }
})("<%=Encoder.encodeForJS(untrustedData)%>"), y);

Using N-Levels of Encoding:

If your code looked like the following, you would need to only double JavaScript 
encode input data.

setTimeout(“customFunction(‘<%=doubleJavaScriptEncodedData%>’, y)”);
function customFunction (firstName, lastName)
      alert("Hello" + firstName + " " + lastNam);
}

The doubleJavaScriptEncodedData has its first layer of JavaScript encoding 
reversed (upon execution) in the single quotes. Then the implicit eval() of 
setTimeout() reverses another layer of JavaScript encoding to pass the correct 
value to customFunction. The reason why you only need to double JavaScript 
encode is that the customFunction function did not itself pass the input to 
another method which implicitly or explicitly called eval(). If "firstName" was 
passed to another JavaScript method which implicitly or explicitly called eval() 
then <%=doubleJavaScriptEncodedData%> above would need to be changed to 
<%=tripleJavaScriptEncodedData%>.

An important implementation note is that if the JavaScript code tries to utilize 
the double or triple encoded data in string comparisons, the value may be 
interpreted as different values based on the number of evals() the data has 
passed through before being passed to the if comparison and the number of times 
the value was JavaScript encoded.

If "A" is double JavaScript encoded then the following if check will return 
false.

var x = "doubleJavaScriptEncodedA";  //\u005c\u0075\u0030\u0030\u0034\u0031
if (x == "A") {
   alert("x is A");
} else if (x == "\u0041") {
   alert("This is what pops");
}

This brings up an interesting design point. Ideally, the correct way to apply 
encoding and avoid the problem stated above is to server-side encode for the 
output context where data is introduced into the application. Then client-side 
encode (using a JavaScript encoding library such as ESAPI4JS) for the individual 
subcontext (DOM methods) which untrusted data is passed to. ESAPI4JS (
located at http://bit.ly/9hRTLH) and jQuery Encoder (located at 
https://github.com/chrisisbeef/jquery-encoder/blob/master/src/main/javascript
/org/owasp/esapi/jquery/encoder.js) are two client side encoding libraries 
developed by Chris Schmidt.

var input = “<%=Encoder.encodeForJS(untrustedData)%>”;  //server-side encoding
window.location = ESAPI4JS.encodeForURL(input);  //URL encoding in JavaScript
document.writeln(ESAPI4JS.encodeForHTML(input));  //HTML encoding in JavaScript

Using sanitizer:

If your application handles markup -- untrusted input that is supposed to 
contain HTML -- it can be very difficult to validate. Encoding is also 
difficult, since it would break all the tags that are supposed to be in the 
input. Therefore, you need a library that can parse and clean HTML formatted 
text.

OWASP AntiSamy: https://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project
import org.owasp.validator.html.*;
Policy policy = Policy.getInstance(POLICY_FILE_LOCATION);
AntiSamy as = new AntiSamy();
CleanResults cr = as.scan(dirtyInput, policy);
MyUserDAO.storeUserProfile(cr.getCleanHTML()); // some custom function

OWASP Java HTML Sanitizer:
https://www.owasp.org/index.php/OWASP_Java_HTML_Sanitizer_Project
import org.owasp.html.Sanitizers;
import org.owasp.html.PolicyFactory;
PolicyFactory sanitizer = Sanitizers.FORMATTING.and(Sanitizers.BLOCKS);
String cleanResults = sanitizer.sanitize("<p>Hello, <b>World!</b>");
http://owasp-java-html-sanitizer.googlecode.com/svn/trunk/distrib/javadoc
/org/owasp/html/Sanitizers.html

Other sanitizer libraries:
http://htmlpurifier.org/ (PHP)
https://github.com/ecto/bleach (JavaScript / Node)
https://pypi.python.org/pypi/bleach (Python)

Rules:
0. Never Insert Untrusted Data Except in Allowed Locations
1. HTML Escape Before Inserting Untrusted Data into HTML Element Content
2. Attribute Escape Before Inserting Untrusted Data into HTML Common Attributes
3. JavaScript Escape Before Inserting Untrusted Data into JavaScript Data Values
   1. HTML escape JSON values in an HTML context and read the data with 
      JSON.parse
4. CSS Escape And Strictly Validate Before Inserting Untrusted Data into HTML 
   Style Property Values.
5. URL Escape Before Inserting Untrusted Data into HTML URL Parameter Values.
6. Sanitize HTML Markup with a Library Designed for the Job
7. Always JavaScript encode and delimit untrusted data as quoted strings when 
   entering the application:
   var x = “<%=encodedJavaScriptData%>”;
8. Use document.createElement(“…”), element.setAttribute(“…”,”value”), 
   element.appendChild(…), etc. to build dynamic interfaces. Please note, 
   element.setAttribute is only safe for a limited number of attributes. 
   Dangerous attributes include any attribute that is a command execution 
   context, such as onclick or onblur. 
9. Avoid use of HTML rendering methods:
   a. element.innerHTML
   b. element.outerHTML
   c. document.write(...)
   d. document.writeln(...)
10. Understand the dataflow of untrusted data through your JavaScript code. If 
   you do have to use the methods above remember to HTML and then JavaScript 
   encode the untrusted data.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License