Evolving Large Populations of Controllers for Complex Systems: July 2014

Monday, July 21, 2014

Potentially Invalid Data From All Trials

Thanks to a fundamental misunderstanding of the function of "elites," the select few members of a generation of a GA that get passed on to the next, in our GA, it's possible that all of our data is invalid. My understanding from what I was told of the code and my own examinations, was that an elite was kept for one generation, but if it failed it was discarded. This is apparently not the case, and elites are kept until the end of the GA regardless of failures along the way.

When looking at the data before, what I thought I was seeing was 30-40 great results and ~10 errors per last generation per population. Now I look at this same data with the understanding that the 30 elites could be hiding 30 errors underneath them. If this is the case, then virtually all of our data is invalid due to the fact that we cannot be certain that the PaaS-es are outputting results throughout a run of the GA.

What this means for the project is that a month's worth of data collection and analysis may have to be thrown out the window (worst-case scenario). I'm conferring with Dr. Remy to see how we should move forward.

Friday, July 18, 2014

Where We Are Right Now

Some issues were found with the PaaS2 results, regarding received errors. When planning this experiment, we went ahead without accounting for the fact that a client machine's speed affects how much uptime a cloud app experiences. The implication of this is that services with a limit on an app's uptime will experience a client making requests differently depending on the client's speed.

Say you have two clients talking to two instances of the same app. Each instance has 4 hours of allotted uptime per day, and once that uptime is exceeded any client sending requests to these apps will receive only 500 errors in response. One of your clients causes 20 minutes of uptime per run of a program, and the other causes 30 minutes per run of an identical program; this is due to hardware differences between the two clients. If you are required to run this program 9 times per day per app/client pair, and each client is constrained to one app, the app running 20 minutes per client program will not exceed its allotted time, but the app running 30 minutes will.

This is an oversimplification of our problem. First, we have to factor in the fact that we don't know how much time one request to an app instance will eat up on the server end. Conditions on the server end may change, and the vendor's method for timing uptime is unknown and seems to have other factors beyond straight app uptime involved. In addition, we have to take into account travel time to the server, which is affected by network conditions in way we cannot predict. This adds to our client uptime, in turn affecting our app uptime.

There are a few morals here: "careful planning can still surprise you with bumps," "networking is complex," etc. The biggest, though, is that we have to restart data collection on PaaS2 for 2 of our 3 clients.

I restarted yesterday, and the results are good, but PaaS2's dashboard did show error replies in some of the runs. Apparently we haven't hit that sweet spot yet where they occur so minimally that we don't have to worry about them.

Thursday, July 17, 2014

Some More Results

After a week of tests, we have some results in. I'm hesitant to post the boxplots of our timing measurements because there is an issue we only caught after the fact affecting exactly one PaaS client running on one of our machines. Those tests are being re-run until we have a matching amount of results.

Our control client, a computer designed to handle complex operations like the multiprocess GA, performed essentially as expected. Runtimes for individual populations are tight around the median time, and even the outliers are not egregiously different. One PaaS clearly has more consistent timing than the other, but the less consistent PaaS has faster completion times.

The client representing the Average Joe computer performed about as expected as well. It is clearly slower than the control client, but can still run a whole population in a reasonable amount of time. There are many more outliers on both PaaS plots for this client, but the same trend was seen with one PaaS being more consistent in timing and the other being somewhat faster. One of the PaaS-es also returned the only (legitimate) error codes seen on the plots to this client.

I would talk about the third client, but it's undergoing re-testing on one PaaS, so that discussion will wait until all results are in.

Thursday, July 3, 2014

Preliminary Results

Testing has been going steady on Newton (hence the image of Sir Isaac Newton in the last post) for a week now. The results of computations on the cloud end are promising: barring bad requests the best individuals come out of the calculations with fitness scores in the 2-3 hundreds (read: good). On Heroku, the runtime for a set of 10 concurrent GAs is ~1430s; on GAE, that time is closer to 900s.

We had to back of from testing on the Walter, the lab's Raspberry Pi, because he, sadly, cannot create enough forked processes to handle the GA. Attempts on the mystery third client IaaS were also largely squashed due to resource constraints, so we have moved to a different IaaS, and testing has commenced there.