Tuesday, June 5, 2007

Google Developer Day - Session 2 - A Computing System for the World's Information: A Look Behind the Scenes at Google

The second true session of the day (not including the keynote) was called "A Computing System for the World's Information: A Look Behind the Scenes at Google." This session, presented by Jeff Dean, primarily focused on specific aspects of Google's infrastructure. To summarize their systems' philosophy, build a large amount of cheap machines that are expected to fail, but write extremely reliable software to gracefully handle the failures and milk the hardware as much as possible. By "cheap machines," I mean literally desktop grade hardware, no complicated expensive servers. Mr. Dean joked that ultra-reliable hardware makes programmers lazy. During the presentation, three key internally developed technologies were presented, GFS, MapReduce, and BigTable. I don't plan on going into all the notes I have on these, in fact I don't understand them all myself, but here's some highlights.

GFS is Google's internally developed file system specifically tuned to their unique requirements (did you really think that NTFS could store all the world's information?). Some of these requirements include huge read/write bandwidth and reliability over thousands of cluster nodes. By writing their filesystem from scratch, they can also ensure that it has the reliability necessary for the crappy hardware used. The filesystem helps Google's 200+ clusters, some of which have up to 5000 machines.

MapReduce is a library of code or "system used for expressing a way of computation such that a programmer must write the computation in a certain style," designed for processing lots of data. The system defines two phases, a map phase and a reduce phase. The map phase extracts relevant information from each record of the input, and the reduce phase collects the data together to produce the final output. This technology is useful for batch processes, and helps solve problems in a standard way. Thousands of programs at Google use this. For other information, search Google for "MapReduce" or read this article.

BigTable is another internally developed technology that Mr. Dean described as a higher level API than a filesystem, somewhat like a database, but not as full featured. I understood this as a giant data structure to store and organize a ton of information. Google's crawlers use BigTable
to help index the vast about of information on the Internet. BigTable is also used by a lot of other data heavy applications for their storage on GFS. Here's a paper Mr. Dean wrote on BigTable.

Google Developer Day - Evening Reception

The best part of the day was the evening event at the Googleplex. As I mentioned, the conference took place in San Jose, which is about 25 mins south from the Googleplex in Mountain View. Google provided their renown commuter buses as free transportation between the convention center and their campus. Here's the outside of one of the buses:


And the inside:


There is a giant fleet of these buses which provide free transportation for Google employees to and from work (they even have free WiFi Internet on them). After arriving at the Googleplex, everyone was funneled into the central courtyard between all the buildings. On the way there, we passed the personal swimming pools that have been posted all over the Internet.

Inside the courtyard, it was clear there was a party. In fact, it felt more like a carnival than a corporate reception. There were many white tents setup, each of which had a different type of food.
Check out this picnic table that also acts as a job posting service and a white board (where do they get those things?)!
After filling my stomach on all types of good food, I was able to convince one of the Googlers I met, named JP, to give me a personal tour of one of the buildings. I really wanted to see how the software development areas were setup. JP walked me through registering as a visitor, and we entered one of the buildings (I forget which exactly). It's hard to summarize exactly what the insides of one of the buildings was like. Of course the cube areas felt like office space, but there were plenty of open meeting areas decorated with a similar feel to an Ikea display. I managed to snag this picture of one of there conference rooms (those are two projectors mounted in the ceiling):

Regarding the development stations themselves, I was able to see three different types: a typical cube, an agile island, and two person pod. The typical cube was just that, a typical cube. The "agile island" was a configuration of work stations that I felt was perfect for agile development. Imagine a large open square of about eight work stations, with two in the middle. The large amount of open space is very conducive to free conversation and thought. The two person development pods were a sort of small office made from temporary glass walls. There were two development stations in these pods. Unfortunately I was not able to get pictures of these.
After my mini-tour, we returned to the party. Instead of hanging out in the courtyard, we moved into one of the buildings that were open to entertain the guests as well. I thought I was walking into Dave n Busters downtown Philly than a corporate lobby. There were arcade machines, pool tables, a slideshow running on a huge screen, and disco lights.

It wasn't until I went to the bathroom that I found one of the most best ideas Google had, TOTT (testing on the toilet). In reading distance from any vantage point during a bio-break, developers can review techniques to better test their code!

Furthermore, I did a little poking around when I got home and found that all of these articles are available as PDFs online for other development shops to use! Here's a couple links: http://www.artima.com/forums/flat.jsp?forum=155&thread=192883 and http://googletesting.blogspot.com/

Monday, June 4, 2007

Google Developer Day - Overview and Keynote

Google Developer Day 2007 was a free conference provided by Google. The conference was initially scheduled to happen at the Googleplex, but due to an overwhelming response of attendees, Google moved it to a larger venue, the San Jose Convention center. This was somewhat disappointing because I was looking forward to spending the day at Google's headquarters. The good news was that there was still an evening event scheduled at the Googleplex. As I walked into the lobby of the convention center, the first thing I noticed was all the Google colored decorations. Registration was split into three different tables separated by last name, A-I, K-Q, R-Z. I'm not sure how Google decided on this split, but it didn't seem very efficient, my line (K-Q) was about 10 times longer than the others (but it did move quick)! Here's a picture of the check-in desk:
Elsewhere in the lobby were demo's of Google's Search Appliance and other products, foosball tables, pool tables, ton of free snacks, and a field of bean bag chairs.



Here's a link to YouTube where Google posted videos of all presentations as they were recorded professionally. During each time slot during the day, there were about six possible presentations one could attend, I tried to pick the most interesting or applicable to my work. Here's a full list of all the sessions. I'll post a different blog entry for each session I attended, some will be more extensive than others.

Opening Keynote - Jeff Huber - Vice President, Software Engineering
While waiting for the keynote to begin, there was a repeating video playing demoing many applications built with Google technology, but not all necessarily built by Google. Throughout the day they referenced a couple of them as good examples of how Google's technology could be used (I'll mention some later). Mr. Huber's presentation highlighted what was to come throughout the day, as well as indicating that there were 1,500 developers attending the conference in San Jose, and that there were 160 Google Developer Day sessions to happen throughout the world (Google Developer Day happened in other cities internationally). Everything, including food, was free (paid for by Google).
Mr. Huber continued, explaining that Google Maps opened people's eyes in 2005 to AJAX and how web applications would be revolutionized. Besides the rich client experience that AJAX enabled web applications to have, AJAX also enabled a way for developers to "mashup" pieces of functionality to extend to the original desired purpose. The idea of mashups started when the community of developers reverse engineered the Google Maps code, and then combined the maps with other features such as mapping out a jogging route (as provided by walkjogrun.net).
Next, Mr. Huber described that the web application development is continually requiring less and less low level development, instead, developers merely reusing existing applications/infrastructures and merge them together (mashup or not) with code only existing at the high level. Here's an example: if someone needed to make a website, no one ever starts by writing the code behind the web server, instead they may actually use a WYSIWYG editor to create the pages and the move them to an already existing hosting service.

The presentation defined three stages towards contemporary development of a web application: integrate, reach, and build. The integrate stage involves determining which data or existing services to combine for your application (check out the Google Mashup editor, an way to quickly and easily create a mashup with only a few lines of xml). The reach stage is used to devise a way to drive traffic to your site. Google advertised their Gadgets as a technique to do this. Citing an example, the Pacman gadget, for iGoogle, successfully drove 6.7 million page views to the authors site in a week! Finally, the build stage of web application development is when everything comes together into the fully functional web application in the fastest, best, and cheapest way. Of course it's all good when your web application is running and people are visiting each day, but what happens if the web is not available? In this case, web applications are useless because no one can actually use them! Mr. Huber introduced Google Gears beta, a framework that developers may use to enable users a way to use web applications without a connection to the Internet! In a nutshell, Google Gears leverages the fact RIA focused websites only ever need to contact the server when data is needed (read or write). Why not download a batch of the data to keep locally if this would enable the web application without an Internet connection? If new data is ever needed, or if data needs to be sent to the server, a local queue of changes is maintained for synchronization whenever the connection to the Internet (or whatever server) is restored. The concept of Google Gears sounds promising, but the technology is still in its infancy (nowhere near ready for enterprise use). It seemed that Google was using the conference as a channel for soliciting help for the open source model being used to develop the product (while a Google product, it is BSD licensed). Read more about Google Gears here (http://gears.google.com/).

As Mr. Huber's presentation was concluding, he introduced a surprise guest, Sergey Brin (cofounder of Google). Mr. Brin thanked everyone for attending and spoke briefly about the current "self sustaining" evolution of the Internet. The Internet is not a sentient being on its own, but it's the result of hard work from many people. These people are a key component of its evolution. Regarding Internet search, Mr. Brin described that as good as Internet searching could become, it is only as good as the content on the Internet (and that would be nowhere without the people posting it).