This article provides guidance on resolving intermittent connection timeout issues that may occur when the Loome Agent is deployed within an Azure Virtual Network (VNet).
When the Loome Agent is deployed to an Azure Container Instance (ACI) behind an Azure Virtual Network, it may experience intermittent timeouts or dropped connections when communicating with our platform. This can lead to delays in data processing and unexpected failures.
By default, Azure Container Instances deployed to a VNet and placed on a delegated subnet without a configured outbound method (such as a NAT Gateway or Firewall) will use a shared pool of public IP addresses and a limited number of SNAT (Source Network Address Translation) ports for all outbound connections.
This can become an issue when an application, such as the Loome Agent, maintains persistent, long-lived connections. Each of these connections consumes a port from the shared pool. Because the agent’s connections hold on to these ports, they are not quickly released and returned to the pool. Consequently, in a high-volume environment, the shared pool can become exhausted, preventing new connections from being established and causing timeouts.
This is a networking limitation of the default Azure setup and not an issue with the Loome Agent itself.
We have released an updated version of the Loome Agent that is more resilient to network failures. This version includes enhanced retry logic and backoff strategies to gracefully handle connection drops and timeouts.
Recommended Action: We strongly recommend updating the Loome Agent to the latest version. This is the simplest and most effective solution to stabilize communication without requiring any changes to your Azure network configuration.
While we strongly recommend updating the agent, a network-level solution can also be configured if you require it for other services. This provides a dedicated pool of IP addresses and SNAT ports, preventing exhaustion.
You have two primary options:
Both options provide a robust solution by eliminating the shared IP address pool, but we recommend the NAT Gateway as the most straightforward and cost-effective option for addressing the specific SNAT port exhaustion issue.
The most direct solution is to upgrade to the latest Loome Agent. However, if you are managing a large-scale or complex Azure environment, configuring an explicit outbound route with an Azure NAT Gateway or Azure Firewall is a best practice that will provide network-wide stability for all your services.