twitter
612-605-1977
f 612-605-1978

Controlling Web Services Timeouts with Threads

Posted by dave on Feb 4, 2013 in Blog | Subscribe

We have been working on performance tuning for a new release of a publicly-facing application.  Users hit the site regularly, and we average about 1.6M page views per month on the site, with a heavy concentration on a certain set of pages.  The most frequently used pages call external RESTful Web Services to provide analysis of user content.  The existing version of this site performs well enough, but we need more flexibility so we have decided to rewrite.  There are risks to a rewrite but we thought the added flexibility would be worth it.

One of the issues we have been seeing on the new release is random, sporadic slowdowns in performance.  Generally these slowdowns manifest themselves as large numbers of sockets in CLOSE_WAIT states.  These sockets were opened by Tomcat to call the RESTful Web Service.  Sometimes the slowdown is accompanied by a large spike in CPU utilization.  We first suspected garbage collection issues, but that doesn’t really explain the CLOSE_WAIT sockets owned by Tomcat.  Calling external Web services (or internal Web Services for that matter) is a common architectural approach, so we needed to test how a delay in the external Web service might get the performance problems to manifest themselves in a controlled environment.

Christine and I wrote a JMeter performance test to emulate the most frequently accessed pages of the site (more on that in a separate article).  Then I created a mock server to emulate issues with the web services not returning results quickly.  I should point out that performance issues in remote web services are hard to diagnose.  These issues are NOT necessarily related to the implementation of the web service or its client code—sometimes the internet is just not that fast.  Also, sometimes folks run DoS attacks that inflict damage on the remote web service.  Nevertheless, this test revealed some interesting results.  Running this test with 1,000 threads with random delays of 10 to 30 seconds, promptly shut off all network access to the server until the network stack recovered.  When I finally was able to log back in, I discovered over 100 sockets in CLOSE_WAIT status owned by Tomcat.  Eureka!

This test tells me that we want to insulate ourselves from having the web service take a long time to execute.  It’s time for some threading to time out the request and let us keep on trucking.

Back in JDK 1.0.2, we would write a thread pool backed by an Vector of threads.  The threads would do the actual call of the web service and the Tomcat thread would wait until notified or timed out.  I won’t bore you with the code details.  It involved writing an inner class and using synchronize() blocks with notify() and wait().  But we are using Java 6 and have java.util.concurrent at our disposal.  Let’s see how this can help us out.

We use ReCaptcha as one of our web services.  To start, I moved the code to call Recaptcha to a separate worker class.  This class implements java.util.concurrent.Callable.  Since we need a Boolean to tell us if the captcha was OK, we use the generic interface Callable<Boolean>.   The code is pretty simple.  We put the guts of the Recaptcha call in the call() method and return a Boolean.

    static class ReCaptchaWorker implements Callable<Boolean> {
        ReCaptcha reCaptcha;
        String remoteAddress;
        String recaptcha_challenge_field;
        String recaptcha_response_field;

        ReCaptchaWorker(ReCaptcha reCaptcha, 
                       String remoteAddress,
                       String recaptcha_challenge_field, 
                       String recaptcha_response_field) {
            this.reCaptcha = reCaptcha;
            this.remoteAddress = remoteAddress;
            this.recaptcha_challenge_field = recaptcha_challenge_field;
            this.recaptcha_response_field = recaptcha_response_field;
        }
        public Boolean call() {
            boolean recaptchaOK = true;
            if (StringUtils.isEmpty(recaptcha_response_field) 
               || StringUtils.isEmpty(recaptcha_challenge_field)) {
                logger.info("Missing captcha fields.");
                recaptchaOK = false;
            }

            if (recaptchaOK) {
                ReCaptchaResponse captchaResponse = 
                     reCaptcha.checkAnswer(remoteAddress, 
                               recaptcha_challenge_field, 
                                recaptcha_response_field);
                if (!captchaResponse.isValid()) {
                    recaptchaOK = false;
                    logger.error("Recaptcha error=" + 
                                   captchaResponse.getErrorMessage());
                }
            }
            return recaptchaOK;
        }
    }

Next, I created an Executor backed by a pool of threads.  I decided to rename the threads and make them daemon, so I could shut down Tomcat without killing the process.

    ExecutorService executor;
    static int captchaWorkerCounter = 0;
    public CaptchaHelper() {
        ThreadFactory factory = new ThreadFactory() {
            public Thread newThread(Runnable r) {
                Thread t = new Thread(r);
                t.setDaemon( true );
                t.setName("CaptchaWorker-" + captchaWorkerCounter);
                captchaWorkerCounter++;
                return t;
            }
        };
        executor = Executors.newCachedThreadPool(factory);
    }

IMPORTANT SAFETY TIP:  Don’t forget to create the Thread with the Runnable.  Otherwise your code won’t run.  Don’t ask how I know this.  I just wish I could get that hour back…

Calling the web service is easy.  I have left the exception handling out for clarity.  The key is the get() method.  It will wait for 10 seconds then time the request out.

            ReCaptchaWorker worker = new ReCaptchaWorker(
                                     reCaptcha, remoteAddr, 
                                recaptcha_challenge_field, 
                                 recaptcha_response_field);
            Future<Boolean> captchaTask = executor.submit(worker);
            captchaOK = captchaTask.get(10, TimeUnit.SECONDS);

Now we have a thread pool, a worker and a way to timeout the request, all without writing any notify/wait or synchronization blocks.  Pretty cool!

Why is this important? Code that performs well and handles unexpected delays is a key part of writing “boring software”.   Your users expect your code to be fast and predictable.  This technique can help your code meet those goals.

Credits:  I picked up some examples from StackOverflow.com (specifically article http://stackoverflow.com/questions/536327/is-it-a-good-way-to-use-java-util-concurrent-futuretask) and found this tutorial (http://tutorials.jenkov.com/java-util-concurrent/executorservice.html) by Jakob Jenkov, to be useful when I messed up the newThread code.

Tags: , ,

Reply