Wednesday, June 11, 2014

Platform as a Service



Over the past three weeks of silence, I've been delving into the Heroku app that I set up which spawned this post. Heroku is a Platform as a Service cloud computing infrastructure. It provides an environment for running web applications in a variety of programming languages (Ruby, Python, Java, Scala, and Clojure, to name a few), and within those to run on a variety of frameworks, particularly Ruby on Rails. My app, as the post describing how to set up an app on Heroku implies, is a Python app running on web.py. It runs the calculations for a ball/plate system, and I access it using a genetic algorithm(GA) from a remote client.



The motivation for this work is to determine the feasibility, effectiveness, and efficiency of using cloud platforms to perform work on large amounts of data. The GA is typically run 50-125 times with a population of 50 over 80 generations of that population (200,000 - 500,000 calculations). Thusfar, on Heroku, the run of 125 GAs takes around 2.5 hours, and I'm still analyzing the results. This is compared to ~4 hours for 1 run of the GA on my MacBook Air.
I'm documenting this process so that others may be able to find it and use what I've learned to inform decisions on how they will process massive amounts of data. Heroku seems to be able to handle the workload, though it does occasionally cut me off when running the GA--it sends "Request not processed" replies in this case, but the results generated look promising for using Heroku for future big data operations.


Dr. Remy also set up a version of the GA on Google App Engine (GAE), and I have not interacted with it enough to be able to talk about how it holds up, but the business constraints on GAE's end seem to point to Heroku as a more viable solution for projects that need lots of uptime. GAE imposes 28 hours of uptime a day for free, then it cuts you off unless you pay for more. The 28 hours includes all processes allocated to your app. Dr. Remy reported being cut off after one 125 run of the GA. Heroku does not do this, instead granting the use of a "dyno," which seems to be one instance of an app running and encompasses all processes associated with that app. They charge by the "dyno-hour" and offer 710 free hours a month. Any uptime for a dyno counts toward the dyno-hours used that month, but one dyno cannot use more than 24 hours a day (as far as I can tell). This is enough hours to run one dyno for free each month, which offers more flexibility than GAE. We don't yet have performance comparisons between Heroku and GAE, but the blog will be updated when those are available.

No comments:

Post a Comment