[Python] URL Un-shorten-er

Saturday, 30 March 2013

[Python] URL Un-shorten-er

Hi all,

I was playing with the BeautifulSoup library of Python. It is a very good library to parse and extract data out of HTML documents, even the ones that are poorly coded. So, while I was playing, I thought why not make something interesting. I looked around for inspiration and while browsing my Twitter feed, I came across many shortened URLs, why not demystify them? That's why, I wrote a small script to Un-shorten those URLs. I used a website, URLXray, to resolve the URLs. This is very naive script, I know. But still, that helped me in learning BeautifulSoup.

I'm a naive in Python, therefore, script may look a bit long.

Here's the code:

#!/usr/bin/env python
# Simple script to un-shorten the shortened URLs using the website - URLXray.com
# Author: Rahul Binjve (@RahulBinjve)
# Usage: ./urlDecode.py URL
import urllib2
from bs4 import BeautifulSoup
import sys
def main():
if len(sys.argv) < 2:
print "\nUsage: urlDecoder.py \"URL You Want to decode\""
sys.exit(1)
result = decode(sys.argv[1])
print "\nDecoded URL is -> ", result
#Decode function, all our work will be done here.
def decode(userArg):
url = "http://urlxray.com/display.php?url=" + userArg
print "\nUser provided URL -> ", userArg
webPage = urllib2.urlopen(url)
tastySoup = BeautifulSoup(webPage)
div = str(tastySoup.find_all("div", class_ = "resultURL2"))
tastySoup = BeautifulSoup(div)
for a in tastySoup.findAll('a'):
if a.has_key('href'):
decoded = a['href']
if decoded:
return decoded
#Standard Python Boilerplate
if __name__ == '__main__':
main()

Thanks for reading.
Cheers.

2 comments:

Aman Nougrahiya31 March 2013 at 03:18
Nice Job, Bro..:)
Keep Experimenting..:)
ReplyDelete
Replies
Unknown29 November 2013 at 00:49
jon said...

Hi Rahul,

Congratulations - it seems like you had an eventful and achievement-filled year! Nice summation of all your activities..keep blogging!

Best wishes for your new role and for the coming holidays season.
ReplyDelete
Replies

Add comment

RahulB's Blog | InfoSec n' All

Miscellaneous

Saturday, 30 March 2013

[Python] URL Un-shorten-er

2 comments: