Sharing our experience building complex environments (DevOps project)

Recently Team Work Spirit designed and implemented the solution for development and production environments using an unusual approach and combination of technologies, and we’d love to share this experience with other teams who consider building complex architecture in AWS.

Fundopolis, our client, built a whole new approach to investing; a new financial system and community that makes it simpler, more approachable and more enjoyable. So that everyone and anyone can invest in the people and ideas that matter to them.

Our task as DevOps team was to define and start building hosting setup for development, staging, production, and CI/CD, all in the cloud.

Technologies used in the project:

– AWS (VPC, CloudTrail, IAM, ECS, ELB, CloudWatch, EC2, ASG, ECR, RDS, S3, S3 Glacier, KMS, DAX, Route53, CloudFront, CloudFormation, AWS Backup, DynamoDB)
– Jenkins;
– Apache ModSecurity proxy;
– OpenVPN AS to manage access to different parts of the app;
– CloudFlare to protect and accelerate the web app online;
– Docker;
– Graylog;
– Zabbix;
– Grafana;
– Varnish;
– created webhooks for third-party services (Auth0, Netki)

The project was delivered in 3 phases:

1) an analysis of the hosting needs based on the architecture outline and deployment plan and creation of a hosting plan including which services should be acquired for each environment, specifications for the security configuration, and an outline plan for future scaling;

2) implementation of a development environment using infrastructure as code;

3) creating of CI/CD pipelines; implementation of the production environment using infrastructure as code.

Some issues we faced and the solutions we implemented:

1) Monitoring. We decided to use Zabbix as a third-party monitoring system for all nodes in order to get more metrics than CloudWatch can provide. Unfortunately, ECS clusters can use only Amazon Linux as an OS for EC2 instances, and Zabbix doesn’t have a client for it. We implemented the workaround to handle this issue with Docker. Zabbix runs in Docker containers on ECS cluster’s instances. This solution enabled us to use the standard image and not to compile it out of Zabbix agent source code.

2) IAM roles and policies. It took much time to choose the proper IAM roles to distribute access rights the right way to specific users. Switching roles between accounts was required, and they should be assumed properly with appropriate policies: which user can assume the role and from which account.
We had 3 different accounts with specific services and these services should get access to each other, often this access should be limited.

3) KMS. KMS was integrated with S3 and IAM, as only specific users should have access to the data stored in S3 (it was defined by IAM roles and policies), and the data were encrypted with KMS keys. We replicated data encrypting them at the same time. As far as replication is involved, users get access to the key to decrypt the files and another key is used to encrypt the data which gets replicated to another region.
We created the CloudFormation template in which KMS keys were specified – the key for decryption and the key for encrypting the file replicated into another region.

4) ECS. All the services used in the environment are launched via user data, so it’s so much important and requires a lot of attention to details when we create a CloudFormation template.

The complexity of the project is caused by the need in 3 accounts: Jenkins does CI/CD jobs for all of them, Garylog collects logs from all services used in these 3 accounts.