When you say theyr'e complex nested UDTs, how big (in bytes) is your base UDT? Are you reading all/most tags in these UDTs or just some? What do your tag groups look like? Are you doing direct on everything or using leased tags? I've found doing all direct is usually faster because it doesn't add extra load/processing of changing poll rates of tags, but that's just my opinion.
Have you tried testing Phil's driver to see if you get better results? It's a drop-in replacement for the Ignition driver.