Failing with F5: CMP - Clustered Multiprocessing

The last Whiteboard Wednesday tackled a few of the gotchas of the F5 CMP implementation. I take that as an opportunity to finish a blog post I started to write six month ago on this topic.

I've run into two scenarios where CMP got in my way:

Enforcing strict connection limits, especially with very low numbers of connections.
Application health checks and the time you've to take into account until you receive no new connections to a pool node.

All of this happened on TMOS 11.4.1+HF run on BigIP 2K systems. Since F5 is sometimes moving very quickly you might not experience those issues on other hardware or with a different software release.

applying low and strict connection limits

Apparently connection limits are counted on a per tmm level and are not shared between the tmm processes running on different CPUs. So if you've two CPU cores with HT you'll see four tmm processes that will all enforce the connection limit. If you have a max connection limit set to two you might still see a maximum of eight connections. That is a very bad thing if you abuse the F5 to limit your outbound connections because your counter part is not able to enforce connection limits, but forces you via a contract to not open more then two connections. Disable CMP and it will all end on the same tmm process and suddenly your connection limits work as expected.

If your connection limit is big enough you can cheat, say it's twelve connections, you can configure a limit of three. With four processes that makes a max of twelve connections as required. Though due to the hashing and distribution among the tmm processes it could happen that you reject connections on one process while others are idle. Another option would be to enter the dirty land of iRules and tables to maintain your own connection count table. There are examples given in the F5 devcentral community, but the best resolution you can get, without starting to scan the whole table on every request, is X requests per minute.

active health checks

Active monitoring checks were added to detect the operational state of our application servers. Now imagine you have some realtime processing with over a hundred requests per minute. So in case you've issues with one instance you'd like to remove it from the pool rather sooner then later, which requires frequent checking. So here is a simple active http check:

defaults-from http-appmon
destination *:*
interval 9
recv "^HTTP/1\\.1 200 OK"
send "GET /mon/state HTTP/1.1\\r\\nHost: MON\\r\\nConnection: Close\\r\\n\\r\\n"
time-until-up 0
timeout 2
up-interval 1

(Configuration options are described here)

With an up-interval of 1 second and a timeout of 2 seconds someone naive like me expected that a fatal issue like a crashed application would be detected within three seconds and no more requests should hit the application after second 3. That turned out to be slightly longer, at least 4s for this case, often longer. Also for a regular maintenance we had to instruct our scripts to flip the application monitor to unavailable and wait for interval X tmm process count + timeout + 1s + max allowed request duration. That is not an instant off.

If you use tcpdump (on the F5 where it's patched to show you the tmm that received the request) to look at the health check traffic you'll notice that the tmm threads pass around the handle to do a health check. So my guess is the threads do only sync the health check state of the application when they're due to run the health check. Makes sense performance wise but if you operate with longer timeouts and bigger intervals you will definetly loose a notable amount of requests. Even in our low end case we already loose a notable amount of requests in 4s. While you could resent them you never know exactly if a request was already processed or not. So you have to ensure that somewhere else in your stack.

One option is to disable CMP but I guess that's not an option for most people that are required to scale out to handle the load and not only for the sake of redundancy. If you're running the balancer only for the sake of redundancy the choice is easy.

You find a hint on this kind of issue if you lookup the documentation for the inband monitor: Note: Systems with multiple tmm processes use a per-process number to calculate failures, depending on the specified load balancing method. That case is a bit different due to the nature of the monitor but points in the same direction.

The actual check is executed by bigd according to this document, so I'm pretty sure I'm still missing some pieces of the picture. Just be warned that you've to really test your outage scenarios and you should double check it if you run a transaction based service that is sensitive to lost requests.