Random 400 errors using the system.net.httpClient() GET function

Yeah, it is very strange that swapping those out makes a difference.

It's gotten more strange.

As best I can tell the failed POST requests are not making it to the Azure server. Also, the test server on our network, running a full backup of the production application, POSTing to the same Azure server, has still, never failed to POST.

I did find an interesting pattern between our test server on our network and the production server.

The TCP connection is opened on both, if it's not already there and the POST is always successful, on both. After 2 minutes the TCP connection is then closed, only on the test sever. The TCP connection on the production server remains open indefinitely. As far as TCPView is concerned anyway.

On the production server, if there's no connection, a new one's opened and the POST request works. I can also manually POST as much as I want for 2 minutes. After 2 minutes, the TCP connection is still there but the next POST request always fails and the connection is closed. Then the next POST will open a connection and be successful as will any others for up to 2 minutes.

I've been able to reliably reproduce this behavior. It's like the TCP connection is told to close after 2 minutes but is left in some kind of failed state.

The even more strange, in the evenings, all the POSTs on the production server are successful. From about 7pm to 7 or 8am, the POSTs work. It's like someone's running something on the network during the day that's causing the issue, as little sense as that makes.

This turned out to be an issue with their temporary Starlink connectivity and its router.

Using TCPView and Wireshark I could see that on their network I did not get a FIN packet from the Azure server. Something was purging the route to the TCP connection after two minutes or so but the local TCP connection thought all was well until eight or so minutes later when the script would try and POST again. It would fail because the Azure server had closed the connection on it's end and the FIN packet it sent was not received. This failed attempt would close the local TCP and ten minutes later when the script ran again it would create a new TCP connection and successfully POST.

After the on-site workday was over and sometimes during the day, it seems there were enough connections available and the POSTs would work consistently as your idle connection would last longer with fewer folks making connections.

They now have their permanent terrestrial internet with different hardware and all is well.

1 Like