(where apparently the forum move deleted all my screenshots, but the graphs were the same as your graph here)
In my case, it wasn’t a memory leak, but the garbage collector taking too long to kick in and subsequently taking too long to clean everything up, stalling all threads in the process.
Switching to the G1GC garbage collector saved the day as that cleans up in smaller parts, causing shorter (even unnoticeable) stalls.
Thanks a lot for the information. I am actually testing G1GC and I’ll let it run for the weekend since the production is stopped for taht period. For the moment it looks good.