Failure on a MASSIVE scale

Like most people dealing with the COVID-19 pandemic, my life has been rather topsy-turvy these last five weeks. But the last week, was notably the worst. Unrelated to COVID, nonetheless, I wish I had a do-over.



I work for a local government agency on my day-job. About four weeks ago, we were directed to work from home. We all took our laptops home and proceeded to set up a home office, if we did not already have one in place. Most of us use laptops rather than towers / desktops since we often need to work at locations other than our desks as part of our jobs, so we simply needed to “plug-in” at home and add a wi-fi connection and we were ready to work.

In the evenings and on weekends, I dabble in local politics as evidenced by my other posts on this site.

In the evenings and on weekends, I dabble in local politics as evidenced by my other posts on this site. At the start of this adventure, I had just been re-elected to my position on my local County Democratic Central Committee. On top of that, I had been serving as the Chair of the committee. Like other California government positions, the Central Committee members were elected on March 3, 2020. The Central Committee was due to install its new Executive Board at the next regularly scheduled meeting. That meeting was due to take place on April 8, 2020.

It turns out that April 8, 2020 was also Passover. We have several members on our Committee who practice the Jewish faith and they had asked if we could change the date of our April meeting (the re-organization meeting called for in our bylaws). The committee came to an agreement on the date of April 15, 2020.

That gave me (as the current Chair) and the DCC’s Governance committee (the committee responsible for conducting the election) about one month to hold this election. That’s also about the time that the Shelter-in-Place orders were first issued by the various government agencies in California. It became abundantly clear that we would not be able to meet in-person to conduct this election. The software developer in me, took this as a challenge to build a software solution to this problem.

Over the next four weeks, I managed to build a website using Microsoft’s Model-View-Controller (MVC) technology. MVC is a technology that has been around for several years now, but one that I had not had the opportunity to use to any real extent. So I took this opportunity as one to develop my skills and provide a useful service at the same time. Along the way, I also learned a little bit about some other platforms all built and provided by Microsoft as part of their development suite of solutions.

Four weeks later, I was ready to publish my site as we were getting ready to hold our delayed election. I added some images and asked some folks to “test-drive” the site to note any bugs that I’d missed. Nothing horrible was found and I was feeling good.

After a day or two trying to host this site on another web hosting provider, I went back to my Amazon AWS account and fired up an instance of a Windows server. Things started going downhill quickly, but I didn’t recognize the red flags.

I used a “t2.micro” instance of Amazon’t EC2 service. This is the smallest instance available and has served me well for several years now as a linux host for this website and a handful of others. When I started the instance with Microsoft Windows, the server took several minutes to load up (RED FLAG) before I could log in.

Once logged in, I noticed that the performance was sluggish, with normal operations (opening a File Explorer window, or trying to start the Control Panel) taking several minutes to start (RED FLAG).

After several hours of effort (RED FLAG), I managed to load SQL Express and SQL Server Management Studio to configure the databases I would need for my application. I ran my database creation scripts and I was good to go.

Next copied my website application files to the server. That took some effort and I had to abandon the upload at least once (RED FLAG) before the files were in place on the server.

I opened a web browser and navigated to my site and things looked promising. The home page came up and I was able to navigate to the public pages on the site (some pages required the visitor to have registered and logged in to an account with Admin or other elevated permissions).

I had forgotten to configure the server to allow remote SQL connections. So I needed to log in to the server remotely once again and again this took a long time (RED FLAG). The slow process was eventually complete and I was able to access the databases from my development workstation. So some final tweaks were put in place on the database and in the application.

Things looked good from the front end. I was able to use the UI to create some test users and all the various features of the site were working as expected.

Some more testing and I was feeling good. I had designated some users as my Admins (I would need to be relegated to the role of a participant in this election out of a sense of fairness) and I was buttoning things up when I noted an error “The request could not be completed.” (RED FLAG)

This was not an error message that I had coded and realized that this was coming from the MVC system, somewhere. Of course, the first thing I did was refresh the page and everything worked once again. Some additional testing failed to reproduce the error (does it ever?) so this passed out of mind.

On election day, things started normally. The attendees (about 50) joined the virtual meeting and the meeting proceeded normally until we got to the election. The first race (mine) was uncontested, so I thanked the attendees and moved on to the second race. This one WAS contested, so my website would get to be put in place.

At the appropriate time, the Admin user opened the election up for voting. Immediately, I saw the error message “Unable to complete the request.” I could hear the other meeting attendees saying that they, too, were seeing that message. After about 30 seconds of discussing moving to our election back-up system, I could hear some folks saying “Wait, it’s going through now.” But we went with the back up place (Google Forms submissions) and conducted our election.

Afterwards, I logged on to the server and started a post-mortem to try and understand what had happened.

Initially, I know I simply did not have enough resources to service the needs of 50 concurrent voters on that website. Looking in the application log, I saw a message indicating that SQL Express had been unable to create a connection instance and that had precipitated a series of failures that caused the website to fail. Bottom line, this site could handle one or two visitors, but not 50.

When I first got the server in a state that was “running,” I created an image on the Amazon EC2 console. Now, after the failure, I used that image to run some tests. I created an instance of a server with 2GB of RAM and noticed some changes immediately.

First, the server powered up in just a few seconds rather than a couple of minutes. I was able to navigate to the default IIS home page in less than 60 seconds from start to finish. However, creating my databases still took several minutes and virtually all the RAM was used up. I also needed to activate the .NET Framework 4.6 feature on that server since I had not yet done that when I created the image of the server. Adding this feature took about 30 minutes on the original server and about 10 minutes on this 2 GB server. So I still felt that this server instance was somewhat suspect with regards to performance.

Next I created an instance with 4 GB of RAM and I finally saw what I would consider “normal” response times and performance for setting up the databases and adding in the missing .NET framework.

Hindsight being 20/20, I now know that I should have used a 4GB server for this project. I had chosen to use the economical (free-tier eligible) t2.micro server instance to save some expenses since I was paying for this out of my own pocket. However, since this server was going to be active for a limited amount of time, I should used a 4 GB server instance from the start.

Windows is very “expensive” compared to Linux for running a simple web server. A small 1 GB web server is able to host several WordPress sites on linux, while it would be insufficient for the same task on Windows.

Lesson learned.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.