Rectangle 27 3

This is Sanitize doing the safest thing by default. It assumes that the portion of the URL before the : is a protocol (or a scheme in the terminology of RFC 1738), and since #fn isn't in the protocol whitelist, the entire href attribute is removed.

You can allow URLs like this by adding #fn to the protocol whitelist:

allowed_protocols = {'a' => {'href' => ['#fn', 'http', 'https', 'mailto', :relative]}}

To whoever downvoted this: Ryan is the guy who wrote Sanitize. If anyone knows the right answer, its probably him.

This worked great, thanks! Just a heads up to anyone else implementing this solution, you'll need to add '#fnref' for the footnote references that link back up to the footnote as well. E.g. allowed_protocols = {'a' => {'href' => ['#fn', '#fnref', 'http', 'https', 'mailto', :relative]}}

ruby on rails - Sanitize gem doesn't like colon inside href attribute ...

ruby-on-rails ruby sanitize
Rectangle 27 1

sanitize works with html strings, not urls. meaning, you can't sanitize an URL itself, but you can sanitize a piece of html which has a link with an malicious url. for example

<%= sanitize "<a href='#{@url}'>Things</a>" %>

ruby - Sanitizing URL to prevent XSS in Rails - Stack Overflow

ruby-on-rails ruby xss
Rectangle 27 1

sanitize works with html strings, not urls. meaning, you can't sanitize an URL itself, but you can sanitize a piece of html which has a link with an malicious url. for example

<%= sanitize "<a href='#{@url}'>Things</a>" %>

ruby - Sanitizing URL to prevent XSS in Rails - Stack Overflow

ruby-on-rails ruby xss
Rectangle 27 1

If you want to sanitize the url on the server-side, use mysql_escape_string($_SERVER['REQUEST_URI']). Even more, use var_dump($_SERVER) to see all the values you can use and play around with them. Btw, why are you sanitizing the URL? What do you want to do with it? I assumed you wanted to put it into a database, but if you want to just remove all the html code from it, the browser will already do the escaping, so you don't really need to do it...

If you want to sanitize it on the client-side, that will make no point, since the attacker can always generate his own http request, using his own tools, not even using the browser.

mysql - PHP Sanitize Url from Address Bar - Stack Overflow

php mysql
Rectangle 27 25

Magic quotes are inherently broken. They were meant to sanitize input to the PHP script, but without knowing how that input will be used it's impossible to sanitize correctly. If anything, you're better off checking if magic quotes are enabled, then calling stripslashes() on $_GET/$_POST/$_COOKIES/$_REQUEST, and then sanitizing your variables at the point where you're using it somewhere. E.g. urlencode() if you're using it in a URL, htmlentities() if you're printing it back to a web page, or using your database driver's escaping function if you're storing it to a database. Note those input arrays could contain sub-arrays so you might need to write a function can recurse into the sub-arrays to strip those slashes too.

"This feature has been DEPRECATED as of PHP 5.3.0 and REMOVED as of PHP 6.0.0. Relying on this feature is highly discouraged. Magic Quotes is a process that automagically escapes incoming data to the PHP script. It's preferred to code with magic quotes off and to instead escape the data at runtime, as needed."

Except that PHP6 never saw the light of day.

Removed in PHP 5.4

security - Magic quotes in PHP - Stack Overflow

php security magic-quotes
Rectangle 27 25

Magic quotes are inherently broken. They were meant to sanitize input to the PHP script, but without knowing how that input will be used it's impossible to sanitize correctly. If anything, you're better off checking if magic quotes are enabled, then calling stripslashes() on $_GET/$_POST/$_COOKIES/$_REQUEST, and then sanitizing your variables at the point where you're using it somewhere. E.g. urlencode() if you're using it in a URL, htmlentities() if you're printing it back to a web page, or using your database driver's escaping function if you're storing it to a database. Note those input arrays could contain sub-arrays so you might need to write a function can recurse into the sub-arrays to strip those slashes too.

"This feature has been DEPRECATED as of PHP 5.3.0 and REMOVED as of PHP 6.0.0. Relying on this feature is highly discouraged. Magic Quotes is a process that automagically escapes incoming data to the PHP script. It's preferred to code with magic quotes off and to instead escape the data at runtime, as needed."

Except that PHP6 never saw the light of day.

Removed in PHP 5.4

security - Magic quotes in PHP - Stack Overflow

php security magic-quotes
Rectangle 27 3

I recommend* URLify for PHP (480+ stars on Github) - "the PHP port of URLify.js from the Django project. Transliterates non-ascii characters for use in URLs".

<?php

echo URLify::filter (' J\'tudie le franais ');
// "jetudie-le-francais"

echo URLify::filter ('Lo siento, no hablo espaol.');
// "lo-siento-no-hablo-espanol"

?>
<?php

echo URLify::filter ('.jpg', 60, "", true);
// "foto.jpg"

?>

*None of the other suggestions matched my criteria:

  • Should not depend on iconv since it behaves differently on different systems
  • Popular (for instance many stars on Github)

As a bonus, URLify also removes certain words and strips away all characters not transliterated.

php - Sanitizing strings to make them URL and filename safe? - Stack O...

php url filenames sanitization
Rectangle 27 2

I have entry titles with all kinds of weird latin characters as well as some HTML tags that I needed to translate into a useful dash-delimited filename format. I combined @SoLoGHoST's answer with a couple of items from @Xeoncross's answer and customized a bit.

function sanitize($string,$force_lowercase=true) {
    //Clean up titles for filenames
    $clean = strip_tags($string);
    $clean = strtr($clean, array('' => 'S','' => 'Z','' => 's','' => 'z','' => 'Y','' => 'A','' => 'A','' => 'A','' => 'A','' => 'A','' => 'A','' => 'C','' => 'E','' => 'E','' => 'E','' => 'E','' => 'I','' => 'I','' => 'I','' => 'I','' => 'N','' => 'O','' => 'O','' => 'O','' => 'O','' => 'O','' => 'O','' => 'U','' => 'U','' => 'U','' => 'U','' => 'Y','' => 'a','' => 'a','' => 'a','' => 'a','' => 'a','' => 'a','' => 'c','' => 'e','' => 'e','' => 'e','' => 'e','' => 'i','' => 'i','' => 'i','' => 'i','' => 'n','' => 'o','' => 'o','' => 'o','' => 'o','' => 'o','' => 'o','' => 'u','' => 'u','' => 'u','' => 'u','' => 'y','' => 'y'));
    $clean = strtr($clean, array('' => 'TH', '' => 'th', '' => 'DH', '' => 'dh', '' => 'ss', '' => 'OE', '' => 'oe', '' => 'AE', '' => 'ae', '' => 'u','' => '-'));
    $clean = str_replace("--", "-", preg_replace("/[^a-z0-9-]/i", "", preg_replace(array('/\s/', '/[^\w-\.\-]/'), array('-', ''), $clean)));

    return ($force_lowercase) ?
        (function_exists('mb_strtolower')) ?
            mb_strtolower($clean, 'UTF-8') :
            strtolower($clean) :
        $clean;
}

I needed to manually add the em dash character () to the translation array. There may be others but so far my file names are looking good.

Part 1: My dads urburts?theyre (not) the best!

'' => 'l', '' => 'L', '' => 'c', '' => 'C', '' => 't', '' => 'T', '' => 'n', '' => 'N', '' => 'l', '' => 'L', '' => 'R', '' => 'r', '' => 'e', '' => 'E', '' => 'u', '' => 'U'

And no doubt many more. I'm actually trying to figure out if there exists an ISO- set that includes combinations of characters. How does one "choose" one set if the content demands characters from all of them? UTF-8 I'm assuming...

$string = transliterator_transliterate('Any-Latin;Latin-ASCII;', $string);

IF you can install PHP extensions on your server (or hosting)

Ah, got it. Thanks @JasomDotnet --I have my current solution working for now but it's a limited character set so the extension is worth checking out.

php - Sanitizing strings to make them URL and filename safe? - Stack O...

php url filenames sanitization
Rectangle 27 10

Taken from the OWASP page linked to below: Untrusted data is most often data that comes from the HTTP request, in the form of URL parameters, form fields, headers, or cookies. But data that comes from databases, web services, and other sources is frequently untrusted from a security perspective. That is, it might not have been perfectly validated.

In most cases, you do need more protection if you are taking input from ANY source and outputting it to HTML. This includes data retrieved from files, databases, etc - much more than just your textboxes. You could have a website that is perfectly locked down and have someone go directly to the database via another tool and be able to insert malicious script.

Even if you're taking data from a database where only a trusted user is able to enter the data, you never know if that trusted user will inadvertently copy and paste in some malicious script from a website.

Unless you absolutely positively trust any data that will be output on your website and there is no possible way for a script to inadvertently (or maliciously in case of an attacker or disgruntled employee) put dangerous data into the system, you should sanitize all output.

and go through the other known threats on the site as well.

In case you miss it, the Microsoft.AntiXss library is a very good tool to have at your disposal. In addition to a better version of the HtmlEncode function, it also has nice features like GetSafeHtmlFragment() for when you WANT to include untrusted HTML in your output and have it sanitized. This article shows proper usage: http://msdn.microsoft.com/en-us/library/aa973813.aspx The article is old, but still relevant.

Yes, so what's the best way to sanitize, I feel like ASP.NET already sanitizes ALL input from text boxes already, so I don't know why I need to do anything extra. What should I do then, Html.Encode() all textbox data being printed on the page? I thought ASP.NEt already did this.

I added another paragraph at about the time you were typing the comment - that paragraph answers the question.

ASP.NET does NOT sanitize data from textboxes - it FILTERS what can come IN. Santiization is done as output is being displayed back OUT from the server to the user.

Someone said that microsoft AntiXSS is a bit of overkill and unnecessary on another page in this site because standard ways of filtering with HtmlEncode etc are good enough.

Do I need extra XSS security for ASP.NET 4 websites? - Stack Overflow

asp.net security xss
Rectangle 27 1

// CLEAN ILLEGAL CHARACTERS
function clean_filename($source_file)
{
    $search[] = " ";
    $search[] = "&";
    $search[] = "$";
    $search[] = ",";
    $search[] = "!";
    $search[] = "@";
    $search[] = "#";
    $search[] = "^";
    $search[] = "(";
    $search[] = ")";
    $search[] = "+";
    $search[] = "=";
    $search[] = "[";
    $search[] = "]";

    $replace[] = "_";
    $replace[] = "and";
    $replace[] = "S";
    $replace[] = "_";
    $replace[] = "";
    $replace[] = "";
    $replace[] = "";
    $replace[] = "";
    $replace[] = "";
    $replace[] = "";
    $replace[] = "";
    $replace[] = "";
    $replace[] = "";
    $replace[] = "";

    return str_replace($search,$replace,$source_file);

}

php - Sanitizing strings to make them URL and filename safe? - Stack O...

php url filenames sanitization
Rectangle 27 0

I recommend avoid hidden input fields to parse variables. Better use the url itself when calling to action: action="printpdf.php?cname=XXXX". If the variable can be spoiled by the user, do one of three things: i)store the variable in server session: $_SESSION['cname']=XXXX; where it can't hardly be manipulated; ii)encode, hash or encrypt the url string; iii) sanitize the variable when recovering in the printpdf.php file. Hidden fields are easy to manipulate and are a primitive way of parsing data.

html - To show php variable in pdf file generated using fpdf - Stack O...

php html pdf fpdf