htdebug, a script for tracing HTTP redirects

Recently, I have written a simple Python script that helps trace HTTP redirects
along with information on rel=canonical links. I called it htdebug.

Here it is:

#!/usr/bin/env python

import urllib
import urllib2
import sys
import requests

import lxml
from lxml import etree
from cStringIO import StringIO

def get_canonical(html_content):
    parser = etree.HTMLParser()
    tree = etree.parse(StringIO(html_content), parser)
    canonical_hrefs = tree.xpath("/html/head/link[@rel='canonical']/@href")
    return canonical_hrefs

if __name__ == "__main__":
    url = sys.argv[1]
    print "Probing {} for redirect path.".format(url)
    response = requests.get(url)
    for resp in response.history+[response]:
        print "[{}] {}".format(resp.status_code, resp.url)
        print "canonical link: {}".format(get_canonical(resp.content))

After placing it in a executable file named htdebug in your $PATH, you can
use it like so:

$ htdebug
Probing for redirect path.

canonical link: []

canonical link: ['']

HTTP status codes are shown in square brackets.

Leave a Reply

Your email address will not be published. Required fields are marked *