I was playing with the BeautifulSoup library of Python. It is a very good library to parse and extract data out of HTML documents, even the ones that are poorly coded. So, while I was playing, I thought why not make something interesting. I looked around for inspiration and while browsing my Twitter feed, I came across many shortened URLs, why not demystify them? That's why, I wrote a small script to Un-shorten those URLs. I used a website, URLXray, to resolve the URLs. This is very naive script, I know. But still, that helped me in learning BeautifulSoup.
I'm a naive in Python, therefore, script may look a bit long.
Here's the code:
- #!/usr/bin/env python
- # Simple script to un-shorten the shortened URLs using the website - URLXray.com
- # Author: Rahul Binjve (@RahulBinjve)
- # Usage: ./urlDecode.py URL
- import urllib2
- from bs4 import BeautifulSoup
- import sys
- def main():
- if len(sys.argv) < 2:
- print "\nUsage: urlDecoder.py \"URL You Want to decode\""
- sys.exit(1)
- result = decode(sys.argv[1])
- print "\nDecoded URL is -> ", result
- #Decode function, all our work will be done here.
- def decode(userArg):
- url = "http://urlxray.com/display.php?url=" + userArg
- print "\nUser provided URL -> ", userArg
- webPage = urllib2.urlopen(url)
- tastySoup = BeautifulSoup(webPage)
- div = str(tastySoup.find_all("div", class_ = "resultURL2"))
- tastySoup = BeautifulSoup(div)
- for a in tastySoup.findAll('a'):
- if a.has_key('href'):
- decoded = a['href']
- if decoded:
- return decoded
- #Standard Python Boilerplate
- if __name__ == '__main__':
- main()
Thanks for reading.
Cheers.
Nice Job, Bro..:)
ReplyDeleteKeep Experimenting..:)
jon said...
ReplyDeleteHi Rahul,
Congratulations - it seems like you had an eventful and achievement-filled year! Nice summation of all your activities..keep blogging!
Best wishes for your new role and for the coming holidays season.