Accepted answer
Score: 41

Technically that is not a valid URL according 8 to section 5 of RFC 1738. Browsers will 7 automatically encode the ã character to 6 %C3%A3 before sending the request to the 5 server. The technically valid full url 4 here is: http://pt.wikipedia.org/wiki/Guimar%C3%A3es Pass that to the VALIDATE_URL 3 filter and it will work fine. The filter 2 only validates according to spec, it doesn't 1 try to fix/encode characters for you.

Score: 10

The following code uses filter_var but encode 2 non ascii chars before calling it. Hope 1 this helps someone.


function validate_url($url) {
    $path = parse_url($url, PHP_URL_PATH);
    $encoded_path = array_map('urlencode', explode('/', $path));
    $url = str_replace($path, implode('/', $encoded_path), $url);

    return filter_var($url, FILTER_VALIDATE_URL) ? true : false;

// example
if(!validate_url("http://somedomain.com/some/path/file1.jpg")) {
    echo "NOT A URL";
else {
    echo "IS A URL";
Score: 4

The parsing starts here:

and is actually 5 done in /trunk/ext/standard/url.c

At a first 4 glance I can't see anything that purposely 3 rejects non-ASCII characters, so it's probably 2 just lack of unicode support. PHP is not 1 good in handling non-ASCII characters anywhere. :(

