Retry Mechanism
Appmixer 6.4 introduces a redesigned retry mechanism that provides intelligent error handling, flow fairness, and graceful recovery from system downtime. The new system automatically determines which errors should be retried, prevents any single flow from monopolizing retry resources, and safely handles large retry backlogs without crashing the system.
The retry mechanism is fully backward compatible and enabled by default with production-ready settings. Advanced users can customize the behavior through environment variables.
Error Classification System
The error classification system intelligently determines which errors should be retried based on error codes and types. This prevents wasting resources retrying errors that will never succeed (like 404 Not Found or 401 Unauthorized).
RETRY_ERROR_CLASSIFICATION_ENABLED
Enable or disable intelligent error classification. When enabled, only retriable errors are retried. When disabled, all errors are retried (backward compatible behavior).
Default value: true
Non-retriable errors (sent directly to UnprocessedMessages):
Client errors: 400, 401, 403, 404, 405, 406, 409, 410, 411, 422, 451
Configuration errors: EACCES, EINVAL, ENOENT
Validation errors
Retriable errors (will be retried):
Server errors: 500, 502, 503, 504
Timeout errors: 408, ETIMEDOUT, ESOCKETTIMEDOUT
Connection errors: ECONNREFUSED, ECONNRESET, ENOTFOUND, EHOSTUNREACH
Rate limiting: 429
Network errors: EAI_AGAIN
RETRY_ERROR_CLASSIFICATIONS
Custom error code overrides for specific error types. This allows you to customize which errors are retriable for your specific integrations.
Format: JSON object {"errorCode": boolean, ...} Default value: {} (uses built-in classification rules)
Example:
RETRY_UNKNOWN_ERRORS
Default behavior for unknown or unclassified errors.
Options:
true- Retry unknown errors (fail-open, safer) - Defaultfalse- Don't retry unknown errors (fail-closed, more conservative)
Example:
Retry Backoff Configuration
Controls the time intervals between retry attempts.
RETRY_BACKOFF
Comma-separated list of time intervals between retry attempts. The system will retry using these intervals in sequence.
Default value: "1,5,60,300,720" (1 minute, 5 minutes, 1 hour, 5 hours, 12 hours)
Example - faster retries:
RETRY_BACKOFF_UNITS
Time unit for the backoff intervals.
Options: seconds, minutes, hours, days Default value: minutes
Example - backoff in seconds:
Retry Quotas
Retry quotas prevent excessive retry attempts from overwhelming the system. When a quota is exceeded, messages are saved to UnprocessedMessages instead of being retried.
QUOTA_CONTEXT_RETRY
JSON configuration for retry limits at system, user, and flow levels.
Default quotas:
System level: 100,000 retries per hour (global limit across all users)
User level: 10,000 retries per hour per user
Flow level: 1,000 retries per hour per flow
Format: JSON array of quota rules
Example - custom quotas:
Retry quotas can also be configured dynamically through the Backoffice System Configuration page.
Advanced Tuning (Optional)
These settings control the DelayedMessages job that processes retries. The default values are production-ready for most deployments.
DELAYED_MESSAGE_CONCURRENCY_GREEN_STATE
Maximum retry messages processed per second when the system is healthy (InputQueue in green state).
Default value: 20 messages/second
DELAYED_MESSAGE_CONCURRENCY_YELLOW_STATE
Maximum retry messages processed per second when the system is under stress (InputQueue in yellow state).
Default value: 5 messages/second
When InputQueue reaches the red state, retry processing automatically slows to 1 message/second to prevent system overload.
DELAYED_MESSAGES_BATCH_SIZE_PER_FLOW
The maximum number of retry messages processed per flow per round in the fair scheduling algorithm. This ensures no single flow monopolizes retry processing.
Default value: 100 messages per flow per round
DELAYED_MESSAGES_TIME_WINDOW_MS
Time window size for round-robin scheduling between flows.
Default value: 10000 milliseconds (10 seconds)
Example - high-volume environment tuning:
Key Behavioral Changes
The redesigned retry mechanism introduces several important improvements:
Intelligent Error Handling: Not all errors are retried. Client errors (4XX) and validation errors go directly to UnprocessedMessages, preventing wasted retry attempts.
Flow Fairness: Round-robin scheduling ensures no single flow can monopolize retry processing. Each flow processes a maximum of 100 messages per round before moving to the next flow.
Graceful Recovery: The system safely handles large retry backlogs (tested with 700,000+ accumulated retries) by dynamically adjusting processing speed based on InputQueue health.
Quota Protection: Retry quotas prevent runaway retry loops and protect system resources.
Circuit Breaker Integration: Retry processing automatically slows down or stops when the system is under stress, then resumes when capacity is available.
Backward Compatibility
The retry mechanism redesign is fully backward compatible:
No action required: Default settings work for existing deployments
No database changes: Works with the existing MongoDB schema
Graceful migration: Existing delayed messages are processed with the new system
Monitoring & Troubleshooting
Monitor retry health:
Watch InputQueue health status (green/yellow/red) in logs
Track retry backlog size in MongoDB
delayedMessagescollectionMonitor quota usage in system logs
Set up alerts for RED/BLACK circuit breaker states
Common troubleshooting:
High retry backlog: Increase concurrency settings or investigate the root cause of errors
Quota exceeded errors: Review quota limits or fix underlying integration issues
Retries not processing: Check InputQueue health status and circuit breaker state
Last updated
Was this helpful?
