Monitoring and Observability
Overview
When deploying Appmixer in a self-managed environment, implementing comprehensive monitoring and observability is crucial for maintaining system health, performance, and reliability. This guide provides recommendations for monitoring both the infrastructure components and the Appmixer application itself.
Appmixer is a Node.js-based application that depends on several infrastructure components:
MongoDB - Primary data store
Redis - Caching
RabbitMQ - Message queue for asynchronous processing
Elasticsearch - Logs storage
Logstash - Log processing
This document is organized into two main sections:
Application Monitoring - Appmixer-specific metrics and health indicators
Infrastructure Monitoring - Guidelines for monitoring the underlying services
Note: We are actively working on expanding this documentation. Future updates will include additional metrics, detailed dashboards, and downloadable configuration files (Grafana dashboards, Prometheus configs, alerting rules) to help you set up monitoring faster. Check back regularly for updates or contact support if you need assistance with your monitoring setup.
Application Monitoring
Appmixer Application Metrics
This section covers monitoring specific to the Appmixer Node.js application. The following areas should be monitored to ensure optimal application performance:
Application Health Endpoints
Appmixer provides two primary health check endpoints for monitoring application availability and system health:
Root Endpoint: GET /
The root endpoint provides a basic liveness check and returns basic API information. This endpoint does not require authentication and can be used for:
Kubernetes/OpenShift liveness probes - To detect if the pod is responsive
Load balancer health checks - To verify the service is accepting connections
Basic availability monitoring - To ensure the HTTP server is running
Response Format:
Response Fields:
name- Application nameversion- Appmixer versionurl- API endpoint URLstudioUrl- Studio UI URLintegrationsUrl- Integrations marketplace URL
Monitoring Recommendations:
Use this endpoint for basic availability monitoring
Expected response time: < 100ms
Expected HTTP status: 200 OK
Alert if response time > 500ms or status != 200
Configure as liveness probe in OpenShift/Kubernetes
System Health Endpoint: GET /system/health
The system health endpoint provides detailed metrics about the Appmixer application's internal state and performance. This endpoint is designed for deep health monitoring and diagnostics.
Authentication:
Requires API key authentication OR
JWT authentication with admin scope
Response Format:
Response Fields:
inputQueue.messageCount- Number of messages waiting in the main input queue for processingThis is the most critical metric for monitoring system load
Normal range: 0-1000 messages
Warning threshold: > 2000 messages
Critical threshold: > 10000 messages
High queue length indicates processing bottleneck or insufficient resources
eventsListeners- Statistics about registered event listeners (webhooks, triggers)total- Total number of active event listenersbyType- Breakdown by listener typeHelps monitor integration activity
events- Total count of events in the systemRepresents pending events waiting for listeners to process
High count may indicate listener processing issues
listeners- Count of flow listenersNumber of components waiting for incoming data
slowQueue.count- Number of flows currently in the slow queueFlows experiencing repeated failures or slow performance
Should be monitored for troubleshooting
slowQueue.top- Top 100 flows by slow queue occurrenceIdentifies problematic flows requiring attention
Each entry includes
flowIdandcount
systemWebhooks- System webhook statistics (only available on worker nodes)registered- Number of system webhooks configuredactive- Number of currently active webhooks
Monitoring Recommendations:
Critical Metrics to Monitor:
inputQueue.messageCount- Alert if > 2000 (warning) or > 10000 (critical)HTTP status - Alert if not 200 OK
Warning Indicators:
slowQueue.countincreasing over timeeventscount growing continuouslyResponse time degradation
Alert Examples:
Critical: Input queue > 1000 messages for > 5 minutes
Warning: Input queue > 500 messages for > 10 minutes
Warning: Slow queue count > 10 flows
Info: System webhooks registered != active
Dashboard Visualization:
Line chart:
inputQueue.messageCountover timeGauge: Current input queue size vs thresholds
Table: Top slow queue flows
Counter: Total events, listeners, actions
Example cURL Request:
Integration with Monitoring Tools:
Prometheus scraper configuration example:
Flow Execution Metrics
This section will be expanded in a future update to include metrics for workflow execution times, success/failure rates, and active flow monitoring. For now, we recommend monitoring the inputQueue.messageCount and slowQueue metrics from the /system/health endpoint as primary indicators of flow execution health.
Component Performance
Detailed component-level performance monitoring guidance is planned for a future documentation update. In the meantime, standard Node.js application monitoring practices apply, and errors from individual connectors will appear in your Elasticsearch logs.
Infrastructure Monitoring
General Principles
For each infrastructure component, we recommend monitoring:
Availability - Is the service up and responding?
Performance - Response times, throughput, and latency
Resource Utilization - CPU, memory, disk, and network usage
Error Rates - Connection errors, timeouts, and failures
Capacity Planning - Trends for storage, connections, and load
MongoDB Monitoring
MongoDB is the primary data store for Appmixer. Monitor the following metrics:
Key Metrics
Database Performance
Query execution time (slow queries)
Operations per second (reads/writes)
Document scan rates
Index usage and efficiency
Replication (if using replica sets)
Replication lag
Oplog window
Member health status
Election events
Resource Usage
Memory utilization (resident and virtual)
Disk I/O (read/write operations)
Disk space usage and growth rate
Network throughput
Recommended Tools
MongoDB Atlas (for managed MongoDB)
MongoDB Cloud Manager / Ops Manager
Prometheus with MongoDB exporter
Datadog, New Relic, or similar APM tools
Alert Thresholds (Examples)
Replication lag > 10 seconds
Disk usage > 80%
Connection pool exhaustion > 90%
Slow queries > 1000ms
Redis Monitoring
Redis is used for caching. Monitor the following:
Key Metrics
Availability
Uptime
Master-slave sync status
Connection success rate
Recommended Tools
Redis INFO command
Prometheus with Redis exporter
RedisInsight
Cloud-native monitoring (if using managed Redis)
RabbitMQ Monitoring
RabbitMQ handles asynchronous message processing for Appmixer workflows and tasks.
Key Metrics
Queue Health
Queue length (messages ready)
Messages unacknowledged
Message rate (publish/deliver/ack)
Queue growth rate
Connection and Channels
Failed connection attempts
Node Health
Memory usage (high/low watermarks)
Disk space (free/used)
Cluster Health (if clustered)
Node availability
Network partition events
Mirror queue synchronization
Recommended Tools
RabbitMQ Management Plugin
Prometheus with RabbitMQ exporter
Datadog, New Relic, or similar APM tools
Alert Thresholds (Examples)
Queue length growing beyond normal capacity
Memory usage > 80% of high watermark
No consumers on critical queues
Disk space < 20% free
Elasticsearch Monitoring
Elasticsearch provides search capabilities and stores operational data.
Key Metrics
Cluster Health
Cluster status (green/yellow/red)
Number of nodes
Resource Usage
JVM heap usage
JVM garbage collection time
CPU usage per node
Disk I/O per node
Storage
Total disk space used
Index size growth rate
Recommended Tools
Kibana Monitoring
Elasticsearch Monitoring API
Prometheus with Elasticsearch exporter
Cloud-native monitoring (if using managed Elasticsearch)
Alert Thresholds (Examples)
Cluster status = red or yellow for > 5 minutes
JVM heap usage > 85%
Disk usage > 85%
Logstash Monitoring
Logstash processes and transforms log data before sending it to Elasticsearch.
Key Metrics
Pipeline Performance
Events received/filtered/sent
Recommended Tools
Logstash Monitoring API
Kibana Monitoring
Prometheus with Logstash exporter
Alert Thresholds (Examples)
JVM heap usage > 85%
Pipeline events duration increasing
Dead letter queue growing
Plugin errors increasing
OpenShift Platform Monitoring
Since Appmixer runs on OpenShift, leverage OpenShift's built-in monitoring capabilities:
Key Aspects
Pod Health
Pod status (Running/Failed/Pending)
Restart counts
Container resource usage vs limits
Resource Quotas
Namespace CPU/memory usage
Storage usage
Pod count vs limits
Network
Service availability
Ingress/route response times
Network policy effectiveness
Persistent Volumes
Volume usage
Volume performance metrics
Volume mount issues
Tools
OpenShift Monitoring (Prometheus-based)
OpenShift Web Console
oc CLI monitoring commands
Recommended Monitoring Stack
Option 1: Prometheus + Grafana (Open Source)
Prometheus for metrics collection
Grafana for visualization and dashboards
Alertmanager for alert routing
Exporters for each infrastructure component
Custom exporters for Appmixer application metrics
Option 2: Commercial APM Solutions
New Relic
Datadog
Dynatrace
AppDynamics
Option 3: Hybrid Approach
Use OpenShift's built-in Prometheus for infrastructure
Add custom Grafana dashboards
Integrate with existing enterprise monitoring tools
Alerting Strategy
Alert Severity Levels
Critical - Immediate action required, service degradation or outage
Production outage
Data loss risk
Security breach
Warning - Attention needed, potential issues developing
Resource usage approaching limits
Performance degradation
Increased error rates
Info - Informational, no immediate action needed
Deployment notifications
Configuration changes
Capacity planning indicators
Alert Best Practices
Define clear runbooks for each alert
Avoid alert fatigue by tuning thresholds
Use alert aggregation to reduce noise
Implement escalation policies
Test alerting channels regularly
Logging Strategy
Log Aggregation
Set appropriate log retention policies based on compliance requirements and storage capacity
Log Levels
Use appropriate log levels:
ERROR - Application errors requiring attention
WARN - Warning conditions
INFO - Informational messages
DEBUG - Detailed diagnostic information (non-production)
Key Logs to Monitor
Application startup/shutdown events
Authentication and authorization failures
API request/response logs (with sampling)
Integration connector errors
Database connection issues
Message queue processing errors
Performance Tuning
Based on monitoring data, consider these tuning areas:
Node.js Application
Adjust worker thread pool size
Optimize memory limits and heap size
Enable clustering for horizontal scaling
Review and optimize slow database queries
Kubernetes Resources
Right-size CPU and memory requests/limits
Configure horizontal pod autoscaling (HPA)
Implement pod disruption budgets
Optimize persistent volume performance class
Capacity Planning
Regular capacity planning should review:
Growth Trends
User/tenant growth rate
Flow execution volume trends
Data storage growth
Resource Utilization
Average and peak CPU/memory usage
Database storage growth
Network bandwidth utilization
Performance Baselines
Establish performance baselines
Track deviation from baselines
Plan scaling activities before limits are reached
Compliance and Security Monitoring
Monitor access logs for suspicious activity
Track failed authentication attempts
Review audit logs for compliance requirements
Track SSL certificate expiration dates
Additional Resources
Last updated
Was this helpful?
