As our team of developers started growing we needed CI/CD to be able to deliver reliably. App developers consume APIs and need to be able to test the latest. Data structures change frequently so we need to ensure data integrity between commits. Comprehensive automatic testing needs to be in place to ensure the code base is stable.
We have defined a number of testing phases
- Regular unit testing with mocks. Our preferred way for testing different paths quickly.
- Testing integration points with mock HTTP responses with Wiremock
- Testing integration points with actual end-points (Optional)
- Testing with with Spring container, database and mocks. Mostly for testing JPA
- Integration tests. Test app deployed in real container with everything
- Test Flyway migrations
- Smoke testing (Only on dedicated server when releasing with prod-like data)
Each of these testing phases has a profile defined in maven, and a couple of them are optional.
Bamboo – Build Server
We are currently using Bamboo for CI. Every time someone commits our suite of tests is executed and the result can be viewed in Bamboo. Bamboo uses a dedicated stock Atlassian EC2 instance to run its tests. This is documented at https://confluence.atlassian.com/display/BAMBOO/Configuring+Elastic+Bamboo
Once the tests have run to completion we build a war file and upload it to Amazon Beanstalk.
This means we’re not maintaining our built artifacts in Bamboo. Bamboo and AWS Beanstalk are a bit opinionated on how to perform CD, but in the end we decided not to use any of Bamboo’s features. In fact we might replace it with Jenkins in the future.
Uploading the war file to AWS Beanstalk is easily performed by executing
mvn beanstalk:upload-source-bundle beanstalk:create-application-version beanstalk:update-environment -s src/main/resources/maven-settings.xml
This also automatically deploys our latest artifact to our dev environment so it always has the latest snapshot version deployed.
We use beanstalk to deploy our app, see http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/Welcome.html
This takes care of load balancing, auto scaling etc and takes away a lot of our ops headaches. Since we stream logs to Logentries we rarely even have to use ssh to access the EC2 instances.
In principle every snapshot should be releasable to prod, this is one of the main principles of CD. We have, however, not yet been able to add sufficient automatic testing to reach such confidence levels.
We release what’s currently in dev using Atlassian’s wonderful Maven plugin called Jgitflow.
This follows the principles of gitflow, see http://nvie.com/posts/a-successful-git-branching-model/
(Jgitflow is also quite useful for hotfixes and feature branches)
Whenever anything is pushed to our master branch, we run a suite of smoke tests. These tests are performed on a prod-like database.
The artifact is uploaded to AWS beanstalk and is then deployed manually to staging for further manual testing, and eventually to prod. We have currently not automated this last step of the release cycle but look to do so in the future.
We are using AWS RDS Postgresql, which takes care of all our scaling and high-availability headaches. We briefly looked into the tools available for Postgresql but quickly decided we didn’t want to do this ourselves. We also plan to start using read-only replicas in the future for certain background jobs such as segmentation.
Database schema migration with Flyway
Ensuring we have a consistent database schema is tremendously important and is easy to get wrong. We also considered other tools such as liquibase but decided Flway is the easiest to use since it allows for plain SQL.
One challenge is how database changesets are versioned since developers work in parallel. Sequence numbers is not an appropriate solution since it has to be coordinated, so we eventually landed on timestamp as version number.
On dev we need to allow out-of-order migrations because multiple developers push commits to the same code base.
This is of course disallowed in prod and usually works well.
One problem with this approach is that our app is currently monolithic and if we have major changes it may prevent rolling updates from working properly. This is one reason we want to work towards polyglot storage and/or microservices in the future so different parts of the system can be deployed independently with minimal downtime.
A wonderful tool that’s helped us a lot with monitoring is New Relic. This immediately tells us when our error rate goes beyond a certain threshold or we have major bottlenecks. Needless to say, this has helped us out a number of times.
Always fun to watch the cache go cold after a deployment…
(Funny, just as I’m writing this New Relic sent me an alert about a high number of 401s on prod, something strange going on…)
We have come a long way but still have a long way to go. The app started out as Java EE but is now mostly Scala, Spring and Akka. And because we are getting a lot more traction we need to prepare for scale. As a result we need to start splitting into microservices and consider polyglot storage. For quite a few things RDBMS is not appropriate and we’re looking to use nosql instead, as we’re mostly just dealing directly with json.
And we need to be able to trace events much more efficiently. Perhaps not a fully fledged CQRS with event sourcing but something in between, we’ll see what we come up with.
As we split into smaller services we may eventually consider using docker and perhaps try out another stack such as Play and slick, but for the time being we’ll stick to beanstalk and Tomcat. Heard some slightly worrying stories so won’t try docker just yet. (Ulimit etc)
I guess we’ll have a lot more ops pain when we make this transition, but it should be worth it in the end…