Monday 17 February 2014

Parallel processing of LAMP Mandrill API via a job queue

Have you ever come across a task of sending close to a million mails from LAMP (Linux Apache MySQL PHP)? And had a hard time convincing the customer about the performance?

We did. And we had to come out with an out of the box idea to manage the same.

Our organization works extensively on solutions using LAMP. And in one such project, where we were building a system to send newsletters to members of an E commerce portal, we saw that the system was working too slowly in sending those mails.

Why?

Because the portal had more than a hundred thousand members and we were using Mandrill API to send those mails one by one (sequentially) via web service calls. This was extremely slow and was overloading the server frequently.

Ah! Here we wished PHP supported multithreading and would help us in doing a lot of parallel processing.  As PHP has no built-in support for multithreading, we came up with the idea of using a “Job Scheduler” software which would indirectly support us in multithreading and completing the job faster without overloading the server.

We brainstormed and evaluated various such software like Beanstalkd, Gearman, Amazon SQS etc. and finally settled on Beanstalkd.

We preferred Beanstalkd because –
- It offered easy installation
- An easy-to-use PHP client was available
- It used very little system resource

Once installed, only minor changes were required in our code to send those emails to a “job queue” instead of to the Mandrill API. Further to this we wrote a piece of “worker” script to process the queue. This “worker” script actually started taking the information from queue and sending the mails via the Mandrill API.

And now, we had the freedom to start multiple worker scripts to process things in parallel and clearing the queue faster.

Yes we still had some possible issues in our hand before the solution could be deployed –
- What if for some reason the Apache Webserver was restarted?
This would kill all the “worker” scripts and someone would have to manually start them.
- In case a higher throughput was required, we needed to increase the number of “worker” scripts manually and needed to kill specific “worker” scripts manually when they were no longer required.

Definitely the entire solution was required to be polished further to handle these issues and we installed a component called “Supervisor” to manage the same. “Supervisor” is a client / server component that allows its users to control a number of processes on UNIX-based operating systems. This component also has a web interface where you can manage the “worker” scripts (Start / stop the individual scripts as and when required).

So finally, although PHP does not support multithreading we had managed to overcome its limitation with the help of Beanstalkd and Supervisor.
Hope to share you similar stories soon. Till then happy reading..
 
 
Written by Gautam Jumrani and contributed by Laxmikant Purohit
Laxmikant Purohit, Senior Manager at Direction has more than 15 years of IT experience. Believes in being Light and Agile philosophy for delivering projects. He is a certified SCRUM Professional.
Gautam Jumrani, Project Lead for LAMP projects, has more than 10 years on industry exposure. Always eager to try out new technologies and challenging scenarios.


No comments:

Post a Comment