10 Informative Infographics

Complex and huge amount of data are hard to comprehend and navigate, especially when they are presented in words or text. Presenting it in words or text is just plain boring and turn off the audience. Information graphics or Infographics is a visual representation of those data. It is an excellent visual representation of data, it is easier to comprehend and the graphics helps us relate with the data. It does add some “emotional aspect” also. This is an excellent way to communicate clearly and it is self explanatory.

Information graphics or infographics are graphic visual representations of information, data or knowledge. These graphics present complex information quickly and clearly – Wikipedia

In order to appreciate them, here are some of the creative, informative and well-designed inforgraphics I have found.

1. World Cup 2010 Calendar – http://www.marca.com/deporte/futbol/mundial/sudafrica-2010/calendario-english.html

2. Revisit is a real-time visualization of the latest twitter messages (tweets) around a specific topic. – http://moritz.stefaner.eu/projects/revisit/

3. It’s 140 most influential Twitter users – http://informationarchitects.jp/c140/

4. The Evolution of Privacy on Facebook – http://mattmckeon.com/facebook-privacy/

5. A Visual History of the American Presidency – http://timeplots.com/potus/

6. An infographic résumé – http://joaocorreia.pt/infographics/infographic-resume/. A very interesting way of presenting resume, this is good for job in design industry.

7. Japan, the Strange Country – by  Kenichi Tanaka http://www.kenichi-design.com/. This one is a video and in Japanese. Unfortunetely the English version was deleted. But it is still interesting too watch.

Japan – The Strange Country (Japanese ver.) from Kenichi on Vimeo.

8. ASK KEN™ is sort of a Node-Link diagram that allows to visually navigate through interconnected topics. – http://askken.heroku.com/ . Btw, this site only support FF, Safari and Chrome.

9. Every Country’s #1 at Something – http://www.informationisbeautiful.net/visualizations/because-every-country-is-the-best-at-something/

10. An information graphic for AOL Autos comparing Teen Drivers to Senior Drivers, visualizing various statistics on who are the safer drivers. http://gavinpotenza.com/old-young-drivers/

Web Scraping using C#

Yesterday, I was looking at the SBS Transit website and looking for “easy” way to get the bus arrival time. Yes, the site has its own mobile version but I thought I can do more if I have the raw data. Unfortunetely the site does not provide any web services. After searching for awhile, found someone (Deepak Sarda) that created a simple JSON API to get the arrival time via http://sbsnextbus.appspot.com/. What he did was web scraping the SBS Iris site run it on GAE.

Web scraping, sound old, messy and tidious. In most cases, it is hard to keep up with the changes and your code will break if the site change layout or content often. Unfortunately, not everyone make the data readily available via web services or json API. I created a simple apps to test the API – http://bit.ly/caXufL – Curious, I decided to work on my own web scraping using c#.

Before you start web scraping, it is good to understand the site behaviour. You can use tools like HttpFox (FF) or HttpWatch (IE) to simulate and find out whether or not the sites uses cookies. For form posting, You need to know all the input parameters. Another special case is ASP.NET site, You need to maintan the ViewState. I have 2 examples below, 1) web scraping web content and 2) posting data.

1. Extracting web content – The simplest way to harvest those HTML is to use HttWebRequest.

    CookieContainer cookies = new CookieContainer();
    string baseUrl = @"web site url";
    System.Net.HttpWebRequest httpwr = (HttpWebRequest)WebRequest.Create(baseUrl);
    httpwr.Method = "GET";
    httpwr.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv: Gecko/20100722 Firefox/3.6.8";
    httpwr.ContentType = "text/html";
    httpwr.CookieContainer = cookies;

    HttpWebResponse res = (HttpWebResponse)httpwr.GetResponse();
    res.Cookies = httpwr.CookieContainer.GetCookies(httpwr.RequestUri);
    string resStr = new System.IO.StreamReader(res.GetResponseStream()).ReadToEnd();

2. Posting data

    string postData = String.Format("__EVENTTARGET=&__EVENTARGUMENT=&param0={0}&param1={1}&param2={2}", "input1", "input2","input3");
    byte[] data = encoding.GetBytes(postData);

    httpwr = (HttpWebRequest)WebRequest.Create(url);
    httpwr.Method = "POST";
    httpwr.AllowAutoRedirect = true;  // To handle redirection. Set to false if it is not required.
    httpwr.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv: Gecko/20100722 Firefox/3.6.8";
    httpwr.ContentType = "application/x-www-form-urlencoded";
    httpwr.CookieContainer = cookies;
    httpwr.ContentLength = data.Length;
    //Post content
    using (var stream = httpwr.GetRequestStream())
        stream.Write(data, 0, data.Length);
    res = (HttpWebResponse)httpwr.GetResponse();
    using (var sr = new StreamReader(res.GetResponseStream()))
        resStr = sr.ReadToEnd();

You can then extend further by exposing the harvested data via web services using the WebMethod in ASP.NET.

  public static Data[] GetData(string input)
    // web scraping the site
    // parse the content
    //store in db or memcache
    return result.ToArray();

In some cases, web scaping is not as straight forward as getting and posting data. Some site maintain specific flow base on specific parameter. In this case, you may need to combine a few call (combination of 1 and 2) to get the result you want. Once the HTML content is harvested, you will then need to parse it. I found a good parser for .NET, it is call HTMLAgilityPack. It supports XPATH or XSLT and Linq, and easy to use.

 HtmlDocument doc = new HtmlDocument();
 foreach(HtmlNode form in doc.DocumentElement.SelectNodes("//form[@action"])
    HtmlAttribute att = form["action"];