Thursday, November 04, 2010

Using the Task Parallel Library with ASP.NET MVC for scalable web applications

Get the full sample solution from  GitHub: https://github.com/mikehadlow/Suteki.AsyncMvcTpl

Most applications spend most of their time doing IO operations. Communicating with a database or web service over the network or reading and writing files can take orders of magnitude longer than running instructions on the CPU. Web applications are no different, but have a additional complexity in that they are multi-threaded by nature, each request being served on a new thread. IIS only maintains a certain sized threadpool for handing requests and once all the threads are consumed, requests start to queue. You can mitigate this by doing asynchronous IO.

Using the new Task Parallel Library in .NET 4.0 makes building asynchronous controller actions much easier, and this post will show you how.

Here’s a common scenario. I have a controller action that makes a couple of calls to a service class:

[HttpGet]
public ViewResult Index()
{
    var user = userService.GetCurrentUser();
    userService.SendUserAMessage(user, "Hi From the MVC TPL experiment");
    return View(user);
}

The two methods I call on my UserService in turn make calls on two further classes:

public User GetCurrentUser()
{
    const int currentUserId = 10;
    return userRepository.GetUserById(currentUserId);
}

public void SendUserAMessage(User user, string message)
{
    emailService.SendEmail(user.Email, message);
}

Yes, this is an entirely contrived and simplified example, but the essentials are in place.

The UserRepository in turn uses ADO.NET to make a call to SQL Server to retrieve a row from the database:

public class UserRepository
{
    public User GetUserById(int currentUserId)
    {
        using (var connection = new SqlConnection("Data Source=localhost;Initial Catalog=AsncMvcTpl;Integrated Security=SSPI;"))
        {
            connection.Open();
            using (var command = connection.CreateCommand())
            {
                command.CommandText = "select * from [user] where Id = 10";
                command.CommandType = CommandType.Text;

                using (var reader = command.ExecuteReader())
                {
                    if (reader.HasRows)
                    {
                        reader.Read();
                        return new User((int)reader["Id"], (string)reader["Name"], (string)reader["Email"]);
                    }
                    throw new ApplicationException("No row with Id = 10 in user table");
                }
            }
        }
    }
}

The EmailService sends an email using the Standard System.Net.Mail.SmtpClient:

public void SendEmail(string emailAddress, string message)
{
    var mailMessage = new MailMessage("info@suteki.co.uk", emailAddress)
    {
        Subject = "Hello!",
        Body = "An important message from Nigeria :)"
    };

    var smtpClient = new SmtpClient("smtp.suteki.co.uk");
    smtpClient.Send(mailMessage);
}

Now, both the call to the database and the email dispatch are IO intensive. Most of the time, the IIS thread that the action executes on will be idle, waiting for first the database and then the email call to return. Even though the thread is mostly idle, it is still not available to process other requests. At some reasonable load, the requests will start to queue and eventually IIS will give up entirely and issue a 503 ‘server busy’ error.

However, both the SqlDataReader.ExecuteReader and the SmtpClient.Send methods have asynchronous versions. Under the bonnet these use operating system IO completion ports which means that the thread handling the request can be returned to the IIS threadpool while the IO operation completes.

The trick is to make sure that the entire stack from the action method down to the BCL call (the BeginExecuteReader and SendAsync calls) knows how to work asynchronously. Prior to the Task Parallel Library, this would have meant implementing BeginXXX and EndXXX methods at every layer (the Asynchronous Programming Model), so we would have had UserService.BeginGetCurrentUser and UserService.EndGetCurrentUser etc, and then on the UserRepository, BeginGetUserById / EndGetUserById. The code starts to look pretty formidable as well.

With the Task Parallel Library, we can represent any async operation as a Task or Task<T>. The first job is to wrap the BCL SqlDataReader and SmtpClient calls as tasks. For BCL classes that implement BeginXXX and EndXXX  methods we can just use the task factory’s FromAsync method, and that’s what we do here with the UserRepository’s GetUserById method:

public Task<User> GetUserById(int currentUserId)
{
    var connection =
        new SqlConnection("Data Source=localhost;Initial Catalog=AsncMvcTpl;Integrated Security=SSPI;Asynchronous Processing=true");
    connection.Open();
    var command = connection.CreateCommand();
    command.CommandText = "select * from [user] where Id = 10";
    command.CommandType = CommandType.Text;

    var readerTask = Task<SqlDataReader>.Factory.FromAsync(command.BeginExecuteReader, command.EndExecuteReader, null);
    return readerTask.ContinueWith(t =>
    {
        var reader = t.Result;
        try
        {
            if (reader.HasRows)
            {
                reader.Read();
                return new User((int)reader["Id"], (string)reader["Name"], (string)reader["Email"]);
            }
            throw new ApplicationException("No row with Id = 10 in user table");
        }
        finally 
        {
            reader.Dispose();
            command.Dispose();
            connection.Dispose();
        }
    });
}

Now GetUserById returns a Task<User> instead of a User. Effectively we are saying that the caller will get a user at some point in the future.

Once the reader has been wrapped in a task we can consume it with a continuation supplied to the ContinueWith method. The continuation can return a value, in our case a User, that then pops out of the ContinueWith method as the promise of a future user; a Task<User>.

SmtpClient doesn’t use APM, so turning its async operation into a task is a little more tricky, luckily this has already been done for us with the TPL team’s Parallel Extensions Extras library. I’m going to use that library’s SendTask extension method here:

public Task SendEmail(string emailAddress, string message)
{
    var mailMessage = new MailMessage("info@suteki.co.uk", emailAddress)
    {
        Subject = "Hello!",
        Body = "An important message :)"
    };

    var smtpClient = new SmtpClient("smtp.suteki.co.uk");
    return smtpClient.SendTask(mailMessage, null);
}

Because the synchronous version of this method returned void, we simply return Task here.

Now the power of the TPL really starts to work for us. We hardly have to change our UserService methods at all:

public Task<User> GetCurrentUser()
{
    const int currentUserId = 10;
    return userRepository.GetUserById(currentUserId);
}

public Task SendUserAMessage(User user, string message)
{
    return emailService.SendEmail(user.Email, message);
}

We are simply passing Task<User> and Task back up the stack, no need to implement Begin/End methods. This is very powerful.

Finally we come to the action method. This is where the code gets gnarly again. We have leave the lovely world of TPL and shoehorn it into the crappy MVC async controller:

public class HomeController : AsyncController
{
    readonly UserService userService = new UserService();

    [HttpGet]
    public void IndexAsync()
    {
        AsyncManager.OutstandingOperations.Increment();
        userService.GetCurrentUser().ContinueWith(t1 =>
        {
            var user = t1.Result;
            userService.SendUserAMessage(user, "Hi From the MVC TPL experiment").ContinueWith(t2 =>
            {
                AsyncManager.Parameters["user"] = user;
                AsyncManager.OutstandingOperations.Decrement();
            });
        });
        
    }

    public ViewResult IndexCompleted(User user)
    {
        return View(user);
    }
}

Jeff Prosise has a very nice post about the ASP.NET MVC async controller. You’d probably want to read that first. But simply put, you inherit from AsyncController and rather than calling your action ‘Index’ you split it into two and call it IndexAsync and IndexCompleted, the router understands this convention and correctly routes Home/Index to the IndexAsync action.

AsyncController has an AsyncManager that keeps track of async callbacks using its Increment/Decrement methods. When the count gets back to zero IndexCompleted is called. You can marshal state to the completed method using the AsyncManager’s Parameters dictionary as shown.

Once again we are using ContinueWith to supply continuations to grab the result of GetCurrentUser and then wait for the SendUserAMessage to complete.

Note that I haven’t considered exception handling in this example. You need to be very careful that you catch and notify exceptions that happen in async operations.

Now for my gripe. It would have been really slick if the MVC team had used TPL to implement async controller methods. I should be able to keep my single Index action but return a Task<ViewResult> instead. My async Index action would then look like this:

[HttpGet]
public Task<ViewResult> Index()
{
    return from user in userService.GetCurrentUser()
           from _ in userService.SendUserAMessage(user, "Hi From the MVC TPL experiment")
           select View(user);
}

Update: Craig Cav has implemented an async controller that does just this. You can read his post about it here. He’s branched my example and demonstrates the Task based async controller here.

Oh yes, tasks are monadic so you can compose them with Linq. No they are not, but the Linq extensions can be found in the TaksParallelExtensions library. See my post here for more info.

Should I care about this?

As with most scalability optimisations this makes your code more complex, harder to understand and harder to debug. I’ve been happily building web applications without doing this for years. You should only ever use this technique if you have an immediate scalability concern and you know that it is caused by a threadpool full of threads blocked by long running IO operations.

Having said that, if you do have these kinds of issues, then using the TPL like this makes them easier to solve than the older being/end spaghetti that you previously would have had to write.

It’s also worth noting that doing this kind of thing will become easier still with C#5’s new async operator. At that point it might we worth doing this kind of async IO as a matter of course.

10 comments:

Ryan said...

I've wanted to take a look at the TPL for awhile. Thanks for the example.

Brad said...

Mike, Thanks for a great intro into using the TPL within a ASP.NET MVC context. I too, am not especially pleased in how polluted the controllers become in this scenario.

The first thing that comes to mind though, is to perhaps inherit from controller and implement the async controller using TPL. Have you looked at this option?

Mike Hadlow said...

Hi Brad,

That's a good suggestion. I might have a look at the MVC3 source code and see how hard it would be.

Anonymous said...

Hi, I didn't really get much of it, could you perhaps be more intermediate developer friendly next time and take more time to explain things?

Anonymous said...

Just a recommendation.

Mike Hadlow said...

Hi Anonymous,

Your feedback could be very useful here. If you let me know what concepts you are having trouble with, I can try and improve the post.

Anonymous said...

Great post Mike.

Unknown said...

Hi Mike,
I've forked your example on GitHub and have implemented a controller action invoker to support async actions through Tasks.

You can check it out here:
https://github.com/CraigCav/Suteki.AsyncMvcTpl

And a quick summary on my blog here:
http://craigcav.wordpress.com/2010/12/23/asynchronous-mvc-using-the-task-parallel-library/

Mike Hadlow said...

Hi Craig,

That's excellent work. I had a quick look at doing the same, but was defeated by the spaghetti code in the AsyncController, I didn't realise there was as better implementation in the futures assembly. Thanks for pointing that out.

I'll update the post to point at your post and fork.

Luis Abreu said...

Hello.

Stupid question: what happens if BeginExecuteReader throws? that will probably result in a non observable exception, right? in fact, if result (ie, the data reader) hasn't returned yet, your task.Result property might end up throwing. Since it's outside the try/finally, won't you end up with an orphan connection?