Best Practices
Running Terrateam in production requires thoughtful infrastructure decisions to ensure reliability, security, and scalability. This guide outlines best practices specific to Terrateam deployments.
Database (PostgreSQL)
Terrateam requires a PostgreSQL database to store state and workflow metadata. Any PostgreSQL deployment works, whether self-hosted or managed.
- Backups should retain at least 7 days of history to enable recovery if needed.
- Enable point-in-time recovery (PITR) if your database provider supports it.
- Use read replicas for failover support.
- Restrict network access to ensure only the Terrateam server can connect.
Terrateam Server Deployment
Terrateam servers are stateless and can be horizontally scaled.
- Run multiple instances to avoid a single point of failure.
- All instances must point to the same PostgreSQL database.
- Use a process manager like
systemd
,supervisord
, or an orchestrator like Kubernetes. - Ensure logs are collected for debugging and auditing.
Load Balancer and Health Checks
A load balancer is required to distribute traffic across multiple Terrateam servers.
- Configure the load balancer to check the
/health
endpoint. - Use connection draining to ensure requests finish before stopping an instance.
- Enable automatic failover so unhealthy instances are removed from rotation.
Rolling Restarts and Zero Downtime Deployments
Since Terrateam is stateless, updates can be deployed without downtime.
- Rolling restarts ensure no instance is removed before another is ready.
- Load balancers should have a deregistration delay so active requests complete before an instance stops.
- Always monitor the
/health
endpoint after deploying changes.
Security Hardening
Terrateam interacts with infrastructure credentials and GitHub repositories, so security should be a priority.
- Use the principle of least privilege for database and cloud access.
- Store credentials securely with a secret manager or environment variables.
- Restrict access to Terrateam’s API to trusted sources only.
- Use TLS to encrypt database and API traffic.
- Regularly rotate database and GitHub app credentials.
Scaling Considerations
Terrateam scales based on repository activity and concurrent workflows.
- Multiple instances can handle high request volumes.
- The database is the main bottleneck.
- Consider auto-scaling based on CPU/memory usage or request volume.
Disaster Recovery
Have a plan to restore operations if something goes wrong.
- Regularly test database backups.
- Maintain a rollback strategy for failed deployments.
- Use multiple availability zones if hosting Terrateam in the cloud.
- Document recovery procedures for quick response during incidents.
GitHub Webhook Reliability
Terrateam relies on GitHub webhooks for processing infrastructure changes.
- Ensure the Terrateam server is accessible from GitHub’s webhook IPs.
- Monitor webhook delivery logs in GitHub to catch failures early.
Following these best practices ensures a stable, secure, and scalable Terrateam deployment.