Monitoring and Observability

Overview

When deploying Appmixer in a self-managed environment, implementing comprehensive monitoring and observability is crucial for maintaining system health, performance, and reliability. This guide provides recommendations for monitoring both the infrastructure components and the Appmixer application itself.

Appmixer is a Node.js-based application that depends on several infrastructure components:

MongoDB - Primary data store
Redis - Caching
RabbitMQ - Message queue for asynchronous processing
Elasticsearch - Logs storage
Logstash - Log processing

This document is organized into two main sections:

Application Monitoring - Appmixer-specific metrics and health indicators
Infrastructure Monitoring - Guidelines for monitoring the underlying services

Note: We are actively working on expanding this documentation. Future updates will include additional metrics, detailed dashboards, and downloadable configuration files (Grafana dashboards, Prometheus configs, alerting rules) to help you set up monitoring faster. Check back regularly for updates or contact support if you need assistance with your monitoring setup.

Application Monitoring

Appmixer Application Metrics

This section covers monitoring specific to the Appmixer Node.js application. The following areas should be monitored to ensure optimal application performance:

Application Health Endpoints

Appmixer provides two primary health check endpoints for monitoring application availability and system health:

Root Endpoint: GET /

The root endpoint provides a basic liveness check and returns basic API information. This endpoint does not require authentication and can be used for:

Kubernetes/OpenShift liveness probes - To detect if the pod is responsive
Load balancer health checks - To verify the service is accepting connections
Basic availability monitoring - To ensure the HTTP server is running

Response Format:

{
  "name": "appmixer",
  "version": "6.2.0",
  "url": "http://api.appmixer.com",
  "studioUrl": "https://studio.appmixer.com",
  "integrationsUrl": "https://studio.appmixer.com/integrations"
}

Response Fields:

name - Application name
version - Appmixer version
url - API endpoint URL
studioUrl - Studio UI URL
integrationsUrl - Integrations marketplace URL

Monitoring Recommendations:

Use this endpoint for basic availability monitoring
Expected response time: < 100ms
Expected HTTP status: 200 OK
Alert if response time > 500ms or status != 200
Configure as liveness probe in OpenShift/Kubernetes

System Health Endpoint: GET /system/health

The system health endpoint provides detailed metrics about the Appmixer application's internal state and performance. This endpoint is designed for deep health monitoring and diagnostics.

Authentication:

Requires API key authentication OR
JWT authentication with admin scope

Response Format:

{
  "inputQueue": {
    "messageCount": 42
  },
  "eventsListeners": {
    "total": 150,
    "byType": {
      "webhook": 80,
      "scheduled": 70
    }
  },
  "events": 1523,
  "listeners": 245,
  "actions": 890,
  "slowQueue": {
    "count": 5,
    "top": [
      {
        "flowId": "flow-123",
        "count": 3
      }
    ]
  },
  "systemWebhooks": {
    "registered": 12,
    "active": 10
  }
}

Response Fields:

inputQueue.messageCount - Number of messages waiting in the main input queue for processing
- This is the most critical metric for monitoring system load
- Normal range: 0-1000 messages
- Warning threshold: > 2000 messages
- Critical threshold: > 10000 messages
- High queue length indicates processing bottleneck or insufficient resources
eventsListeners - Statistics about registered event listeners (webhooks, triggers)
- total - Total number of active event listeners
- byType - Breakdown by listener type
- Helps monitor integration activity
events - Total count of events in the system
- Represents pending events waiting for listeners to process
- High count may indicate listener processing issues
listeners - Count of flow listeners
- Number of components waiting for incoming data
slowQueue.count - Number of flows currently in the slow queue
- Flows experiencing repeated failures or slow performance
- Should be monitored for troubleshooting
slowQueue.top - Top 100 flows by slow queue occurrence
- Identifies problematic flows requiring attention
- Each entry includes flowId and count
systemWebhooks - System webhook statistics (only available on worker nodes)
- registered - Number of system webhooks configured
- active - Number of currently active webhooks

Monitoring Recommendations:

Critical Metrics to Monitor:

inputQueue.messageCount - Alert if > 2000 (warning) or > 10000 (critical)
HTTP status - Alert if not 200 OK

Warning Indicators:

slowQueue.count increasing over time
events count growing continuously
Response time degradation

Alert Examples:

Critical: Input queue > 1000 messages for > 5 minutes
Warning: Input queue > 500 messages for > 10 minutes
Warning: Slow queue count > 10 flows
Info: System webhooks registered != active

Dashboard Visualization:

Line chart: inputQueue.messageCount over time
Gauge: Current input queue size vs thresholds
Table: Top slow queue flows
Counter: Total events, listeners, actions

Example cURL Request:

# Using API key authentication
curl -H "X-API-Key: your-api-key" https://api.appmixer.com/system/health

# Using JWT token with admin scope
curl -H "Authorization: Bearer your-jwt-token" https://api.appmixer.com/system/health

Integration with Monitoring Tools:

Prometheus scraper configuration example:

scrape_configs:
  - job_name: 'appmixer-health'
    metrics_path: '/system/health'
    scheme: https
    static_configs:
      - targets: ['api.appmixer.com']
    bearer_token: 'your-api-key'
    scrape_interval: 60s

Flow Execution Metrics

This section will be expanded in a future update to include metrics for workflow execution times, success/failure rates, and active flow monitoring. For now, we recommend monitoring the inputQueue.messageCount and slowQueue metrics from the /system/health endpoint as primary indicators of flow execution health.

Component Performance

Detailed component-level performance monitoring guidance is planned for a future documentation update. In the meantime, standard Node.js application monitoring practices apply, and errors from individual connectors will appear in your Elasticsearch logs.

Infrastructure Monitoring

General Principles

For each infrastructure component, we recommend monitoring:

Availability - Is the service up and responding?
Performance - Response times, throughput, and latency
Resource Utilization - CPU, memory, disk, and network usage
Error Rates - Connection errors, timeouts, and failures
Capacity Planning - Trends for storage, connections, and load

MongoDB Monitoring

MongoDB is the primary data store for Appmixer. Monitor the following metrics:

Key Metrics

Database Performance

Query execution time (slow queries)
Operations per second (reads/writes)
Document scan rates
Index usage and efficiency

Replication (if using replica sets)

Replication lag
Oplog window
Member health status
Election events

Resource Usage

Memory utilization (resident and virtual)
Disk I/O (read/write operations)
Disk space usage and growth rate
Network throughput

Recommended Tools

MongoDB Atlas (for managed MongoDB)
MongoDB Cloud Manager / Ops Manager
Prometheus with MongoDB exporter
Datadog, New Relic, or similar APM tools

Alert Thresholds (Examples)

Replication lag > 10 seconds
Disk usage > 80%
Connection pool exhaustion > 90%
Slow queries > 1000ms

Redis Monitoring

Redis is used for caching. Monitor the following:

Key Metrics

Availability

Uptime
Master-slave sync status
Connection success rate

Recommended Tools

Redis INFO command
Prometheus with Redis exporter
RedisInsight
Cloud-native monitoring (if using managed Redis)

RabbitMQ Monitoring

RabbitMQ handles asynchronous message processing for Appmixer workflows and tasks.

Key Metrics

Queue Health

Queue length (messages ready)
Messages unacknowledged
Message rate (publish/deliver/ack)
Queue growth rate

Connection and Channels

Failed connection attempts

Node Health

Memory usage (high/low watermarks)
Disk space (free/used)

Cluster Health (if clustered)

Node availability
Network partition events
Mirror queue synchronization

Recommended Tools

RabbitMQ Management Plugin
Prometheus with RabbitMQ exporter
Datadog, New Relic, or similar APM tools

Alert Thresholds (Examples)

Queue length growing beyond normal capacity
Memory usage > 80% of high watermark
No consumers on critical queues
Disk space < 20% free

Elasticsearch Monitoring

Elasticsearch provides search capabilities and stores operational data.

Key Metrics

Cluster Health

Cluster status (green/yellow/red)
Number of nodes

Resource Usage

JVM heap usage
JVM garbage collection time
CPU usage per node
Disk I/O per node

Storage

Total disk space used
Index size growth rate

Recommended Tools

Kibana Monitoring
Elasticsearch Monitoring API
Prometheus with Elasticsearch exporter
Cloud-native monitoring (if using managed Elasticsearch)

Alert Thresholds (Examples)

Cluster status = red or yellow for > 5 minutes
JVM heap usage > 85%
Disk usage > 85%

Logstash Monitoring

Logstash processes and transforms log data before sending it to Elasticsearch.

Key Metrics

Pipeline Performance

Events received/filtered/sent

Recommended Tools

Logstash Monitoring API
Kibana Monitoring
Prometheus with Logstash exporter

Alert Thresholds (Examples)

JVM heap usage > 85%
Pipeline events duration increasing
Dead letter queue growing
Plugin errors increasing

OpenShift Platform Monitoring

Since Appmixer runs on OpenShift, leverage OpenShift's built-in monitoring capabilities:

Key Aspects

Pod Health

Pod status (Running/Failed/Pending)
Restart counts
Container resource usage vs limits

Resource Quotas

Namespace CPU/memory usage
Storage usage
Pod count vs limits

Network

Service availability
Ingress/route response times
Network policy effectiveness

Persistent Volumes

Volume usage
Volume performance metrics
Volume mount issues

Tools

OpenShift Monitoring (Prometheus-based)
OpenShift Web Console
oc CLI monitoring commands

Recommended Monitoring Stack

Option 1: Prometheus + Grafana (Open Source)

Prometheus for metrics collection
Grafana for visualization and dashboards
Alertmanager for alert routing
Exporters for each infrastructure component
Custom exporters for Appmixer application metrics

Option 2: Commercial APM Solutions

New Relic
Datadog
Dynatrace
AppDynamics

Option 3: Hybrid Approach

Use OpenShift's built-in Prometheus for infrastructure
Add custom Grafana dashboards
Integrate with existing enterprise monitoring tools

Alerting Strategy

Alert Severity Levels

Critical - Immediate action required, service degradation or outage

Production outage
Data loss risk
Security breach

Warning - Attention needed, potential issues developing

Resource usage approaching limits
Performance degradation
Increased error rates

Info - Informational, no immediate action needed

Deployment notifications
Configuration changes
Capacity planning indicators

Alert Best Practices

Define clear runbooks for each alert
Avoid alert fatigue by tuning thresholds
Use alert aggregation to reduce noise
Implement escalation policies
Test alerting channels regularly

Logging Strategy

Log Aggregation

Set appropriate log retention policies based on compliance requirements and storage capacity

Log Levels

Use appropriate log levels:

ERROR - Application errors requiring attention
WARN - Warning conditions
INFO - Informational messages
DEBUG - Detailed diagnostic information (non-production)

Key Logs to Monitor

Application startup/shutdown events
Authentication and authorization failures
API request/response logs (with sampling)
Integration connector errors
Database connection issues
Message queue processing errors

Performance Tuning

Based on monitoring data, consider these tuning areas:

Node.js Application

Adjust worker thread pool size
Optimize memory limits and heap size
Enable clustering for horizontal scaling
Review and optimize slow database queries

Kubernetes Resources

Right-size CPU and memory requests/limits
Configure horizontal pod autoscaling (HPA)
Implement pod disruption budgets
Optimize persistent volume performance class

Capacity Planning

Regular capacity planning should review:

Growth Trends

User/tenant growth rate
Flow execution volume trends
Data storage growth

Resource Utilization

Average and peak CPU/memory usage
Database storage growth
Network bandwidth utilization

Performance Baselines

Establish performance baselines
Track deviation from baselines
Plan scaling activities before limits are reached

Compliance and Security Monitoring

Monitor access logs for suspicious activity
Track failed authentication attempts
Review audit logs for compliance requirements
Track SSL certificate expiration dates

Additional Resources

PreviousInstallation AWS ECS NextGetting Started

Was this helpful?

hashtagOverview

hashtagApplication Monitoring

hashtagAppmixer Application Metrics

hashtagApplication Health Endpoints

hashtagFlow Execution Metrics

hashtagComponent Performance

hashtagInfrastructure Monitoring

hashtagGeneral Principles

hashtagMongoDB Monitoring

hashtagRedis Monitoring

hashtagRabbitMQ Monitoring

hashtagElasticsearch Monitoring

hashtagLogstash Monitoring

hashtagOpenShift Platform Monitoring

hashtagRecommended Monitoring Stack

hashtagOption 1: Prometheus + Grafana (Open Source)

hashtagOption 2: Commercial APM Solutions

hashtagOption 3: Hybrid Approach

hashtagAlerting Strategy

hashtagAlert Severity Levels

hashtagAlert Best Practices

hashtagLogging Strategy

hashtagLog Aggregation

hashtagLog Levels

hashtagKey Logs to Monitor

hashtagPerformance Tuning

hashtagNode.js Application

hashtagKubernetes Resources

hashtagCapacity Planning

hashtagGrowth Trends

hashtagResource Utilization

hashtagPerformance Baselines

hashtagCompliance and Security Monitoring

hashtagAdditional Resources

Overview

Application Monitoring

Appmixer Application Metrics

Application Health Endpoints

Flow Execution Metrics

Component Performance

Infrastructure Monitoring

General Principles

MongoDB Monitoring

Redis Monitoring

RabbitMQ Monitoring

Elasticsearch Monitoring

Logstash Monitoring

OpenShift Platform Monitoring

Recommended Monitoring Stack

Option 1: Prometheus + Grafana (Open Source)

Option 2: Commercial APM Solutions

Option 3: Hybrid Approach

Alerting Strategy

Alert Severity Levels

Alert Best Practices

Logging Strategy

Log Aggregation

Log Levels

Key Logs to Monitor

Performance Tuning

Node.js Application

Kubernetes Resources

Capacity Planning

Growth Trends

Resource Utilization

Performance Baselines

Compliance and Security Monitoring

Additional Resources