<img alt="" src="https://secure.hiss3lark.com/167682.png" style="display:none;">
  • share

A blog about software development best practices, how-tos, and tips from practitioners.

Lessons learned from testing a web application for 150000 simultaneous users

For most people, conversation starters in the office break room start from " Did you watch....? ". A research conducted by Marketing Profs estimates that while the population of America receives over 206 channels on an average, they tend to watch only 20. They also estimate that in 2015, 67% of American households owned digital TV services. However, with real-time traffic exploding on the Internet, TV viewing is experiencing a fundamental shift. Live TV and Video on Demand are becoming increasingly commonplace with the increasing penetration of high-speed internet in most emerging economies. Neilsen suggests that the Video on Demand market is expected to grow at an annual compound growth rate of 8.3% to 2021. With an increasing number of people looking at digital channels for television consumption, media and entertainment companies are looking at a multitude of ways to attract this captive audience by providing live streaming and Video on Demand services.  

Our client, a leading entertainment channel, provides a plethora of entertainment channels for their viewers covering television serials, movies, and sports channels. In order to cater to their online audience, they have a Video on Demand platform that allows their viewers to subscribe and view their preferred content at their convenience. This client had built a web application for the EURO 2016 football tournament too, amongst other things, enable their viewers to live stream the matches, watch the highlights and view the schedule of the upcoming matches. Clearly, they needed this application to be high performing and ensure that an increase in the number of concurrent users from multiple geographies in India and the Asian subcontinent did not impact the application performance in any way.

 

This client approached us to address their concerns and come up with a robust testing strategy to assess the application behavior and ensure that their application did not experience any snags during this high-profile sporting event.  Our team had to make sure that the testing strategy we developed was comprehensive, easy to execute, and delivered tangible results. To ensure this, we developed strong test plans, designed load models, and used the right testing tools. We also built an understanding of the application and prepare the JMeter and Load Impact scripts for load testing. Clearly, it was a lot of work involved in this project and we had to make sure that, we complete it within the stipulated stringent timelines provided to us. Given our strong testing focus, we were able to navigate this project successfully but like everything else, this journey too had its share of challenges. Here is how we went about it -

 

Reporting and Analysis

To begin with, we started off with developing a strong performance test strategy and test plan. Considering that the application would be accessed by over 300,000 users from multiple geographical locations during the peak time and could expect a sudden spike of 100,000 users at the time of live matches, we had to make sure that the load models and performance scripts designed were based on the application of key user scenarios. Our client wanted us to execute the performance testing of the application using the Load Impact tool. The challenge here was that this tool did not generate very insightful reports and also involved an additional effort to analyze those reports. To mitigate this challenge, we made sure that after each execution cycle, we did a report analysis and presented it in a simple and easy to understand format.

 

Management of TCP Connections

Given the volume of traffic, we had to make sure that the load engines of the load impact tool could handle the huge numbers of TCP connections generated from every load engine by each virtual user. The load engines in use when we took over the project were unable to do so and that impacted the application performance negatively. To address this challenge, we coordinated with our load impact technical experts who suggested that we use additional load engines. We also did fail-over testing and identified that the application took over 9 seconds to recover. Post the optimization, the application recovery time reduced to 3 seconds and the application could easily handle the sudden spikes in users.

 

Script Execution using Specific Geographical Locations

According to the client requirement, the load engine location had to be in India or somewhere in the Indian subcontinent. The issue was that Load Impact and Blaze meter tools that we were using for the purpose, did not have their engines in this geographical location. We navigated this challenge by identifying the load engines in use in the nearest locations to the Indian subcontinent and leveraged Asia-specific locations like Tokyo, Hong Kong, Singapore, Seoul, and Japan which ensured that the script execution is based on the geographical location of the main users.

 

 Simultaneous Spike and Load Testing 

Just like every other software project, this project too had some changes in requirements midway. Our client wanted to alter the requirement between execution cycles and wanted the spike and load testing to be conducted simultaneously. This meant a lot of work on our end. Our testing team, however managed to mitigate this issue successfully by first updating the script. We then allocated 50% of the user load to spike test and the other 50% to load test in the ensuing executions as per the defined load module.

While we addressed the technology challenges successfully, we also had to make sure that our team coordinated with the server hosting provider (Amazon Web Server, in this case) as server warming or upscaling needed a 12-hour prior intimation. We also analyzed server utilization reports which we had to procure from the hosting team after every execution cycle to find bottlenecks to move ahead in the testing cycle. Additionally, we also had to give a 3-day prior intimation to the Blaze meter professional service provider team for days when we would be testing the application for more than 100,000 user’s execution.

At the end of the project, we ensured that the average response time was optimized to less than 1 second (down from 5 seconds). The application optimization initiatives ensured that the application performance was optimal with 300,000 users and that the application could successfully handle a spike up to 150,000 users with good response time during the live football matches. With the optimized server configuration and application optimization, the application was able to perform optimally and get accessed by a large number of users without any performance snags.

Generic-CTA-01
.

Like what you just read? Get Latest content delivered straight to your inbox.