Remove all .pyc files from a git repository

When working on Python project you may sometimes forget to .gitignore your
*.pyc files and end up having them tracked by the repo.

To fix this you need use git rm:

find . -name "*.pyc" -exec git rm -f {} \;

Then, add a line:

*.pyc

to the .gitignore file in the repository root to have them permanently
ignored.

Credit goes to Yuji.

htdebug, a script for tracing HTTP redirects

Recently, I have written a simple Python script that helps trace HTTP redirects
along with information on rel=canonical links. I called it htdebug.

Here it is:

#!/usr/bin/env python

import urllib
import urllib2
import sys
import requests

import lxml
from lxml import etree
from cStringIO import StringIO

def get_canonical(html_content):
    parser = etree.HTMLParser()
    tree = etree.parse(StringIO(html_content), parser)
    canonical_hrefs = tree.xpath("/html/head/link[@rel='canonical']/@href")
    return canonical_hrefs

if __name__ == "__main__":
    url = sys.argv[1]
    print "Probing {} for redirect path.".format(url)
    response = requests.get(url)
    for resp in response.history+[response]:
        print
        print "[{}] {}".format(resp.status_code, resp.url)
        print "canonical link: {}".format(get_canonical(resp.content))

After placing it in a executable file named htdebug in your $PATH, you can
use it like so:

$ htdebug http://yahoo.com
Probing http://yahoo.com for redirect path.

[301] http://yahoo.com/
canonical link: []

[200] https://www.yahoo.com/
canonical link: ['https://www.yahoo.com/']

HTTP status codes are shown in square brackets.

BuiltWith lookup script

BuiltWith is a web service that lets you check what web technologies are used on a given website. This simple script lets you do that directly from command line.

#!/usr/bin/env python

import urlparse
import sys
import requests

import lxml
from lxml import etree
from cStringIO import StringIO

if __name__ == "__main__":
    domain = sys.argv[1]
    base_url = "http://builtwith.com/"
    url = urlparse.urljoin(base_url, domain)
    print "Visiting {} ...\n".format(url)
    response = requests.get(url)

    parser = etree.HTMLParser()
    tree = etree.parse(StringIO(response.content), parser)

    tech_items = tree.xpath("//div[contains(@class, 'techItem')]//h3")
    for item in tech_items:
        print "".join(item.itertext()).strip()

After placing it in a executable file named builtwith in your $PATH, you can
use it like so:

$ builtwith 37signals.com
Visiting http://builtwith.com/37signals.com ...

GeoTrust SSL
RapidSSL
Dyn DNS
Google Apps for Business
Postmark
Campaign Monitor
DKIM
SPF
Shockwave Flash Embed
Google Conversion Tracking
Clicky
html5shiv
Prototype
YouTube
Amazon S3
IPhone / Mobile Compatible
Device Width