Tuesday, May 3, 2011

Making sure http appears in a web address

I have people posting their website address but variations are posted such as:

When I link to an address without http:// it takes the link as internal

<a href="theirsite.com">their site</a>

sending people to something like: http://mysite.com/thiersite.com

Another option I've tried is linking to something like mysite.com/?link=theirsite.com - This way I can do some link tracking etc then redirect people to the link but it has the same problem:

//do some tracking etc here
$link =$_GET['link'];
header("Location: $link");
From stackoverflow
  • I would provide some validation or sanitation. Use a regex to see if http:// starts with it. If it doesn't, either throw an validation error or put http:// at the start.

  • if not "://" in users_url:
        users_url = "http://" + users_url
    

    ...or equivalent, in the language of your choice.

  • You could use regular expressions to test input

    Regex exp = new Regex(
        @"http://(www\.)?([^\.]+)\.com",
        RegexOptions.IgnoreCase);
    
    OregonGhost : Hopefully the .com is just an example :D
  • put "http://" in the field by default, then validate the URL with something like

    if(eregi("^((http|https)://)?([[:alnum:]-])+(\.)([[:alnum:]]){2,4}([[:alnum:]/+=%&_.~?-]*)$", stripslashes(trim($_POST['link'])))){
        //link is valid
    }
    

    if link does not validate, just print them a message saying "the link you entered is invalid, make sure it starts with 'http://'"

    Fredrik Mörk : I like how this will also take care of the cases when people write "no" or "not yet" or stuff like that.
    Gumbo : +1 For the default value for the URL field. But I wouldn’t use `eregi`.
    Peter : good point with people posting "no" or "not yet"...
    The Pixel Developer : No need to use regular expressions in this case. PHP has built in URL validation using the "filter" extension. See my answer below.
    vartec : ereg is dead, you should use PCRE instead.
  • I would use something like this:

    $link = str_replace(array("\r", "\n"), '', trim($link));
    if (!preg_match('/^https?:\/\//', $link)) {
        $link = 'http://'.$link;
    }
    header('Location: '.$link);
    

    Another way would be the parse_url function to parse the given URL, see what parts are missing and add them.

    Peter : might need to consider trailing space as noted by Elazar and also there was a small typo repalce. Cheers.
    Gumbo : Have you noticed the `trim`?
    Peter : oops, silly me!
  • Please note, there's a real difference between www.site.com and site.com, usually both works, but on some website each leads to a different path (some badly defined website won't work without the www for instance). So You can't always prepend 'www' to the input.

    Another note, do handle prepending space, so that ' http://' would not be prepended with additional http://.

    My Javascript Regex based solution

    'http://'+field.replace(/^ *http:\/\//,'')
    

    You can verify that on the client size, just put a code in similar spirit on the onSubmit of your form.

  • No need to use regular expressions here. PHP has URL validation built in.

    Filter Var

    var_dump((bool) filter_var('http://www.website.com', FILTER_VALIDATE_URL, FILTER_FLAG_HOST_REQUIRED));
    var_dump((bool) filter_var('http://website.com', FILTER_VALIDATE_URL, FILTER_FLAG_HOST_REQUIRED));
    var_dump((bool) filter_var('www.website.com', FILTER_VALIDATE_URL, FILTER_FLAG_HOST_REQUIRED));
    var_dump((bool) filter_var('website.com', FILTER_VALIDATE_URL, FILTER_FLAG_HOST_REQUIRED));
    

    Output

    bool(true)
    bool(true)
    bool(false)
    bool(false)
    

    Please do not jump straight to regular expressions for validation, PHP has a lot of methods built in to deal with these scenarios.

    -Mathew

    Gumbo : But regular expressions are language independent.
    The Pixel Developer : To be fair, he posted it with a PHP tag and I've given him a PHP answer. It's a non-issue.

0 comments:

Post a Comment