5 lessons learned after 2 years at FXStreet IT Department

Javier Hertfelder
5 min readMay 20, 2017

After more than 2 years at FXStreet, with around 300.000 lines of code written and 30 services in production I have decided to put together 5 lessons learned the hard way.

1- No matter what is happening keep your SCRUM cycle intact

Mistake

During these 2 years, we have had bad times, emergencies, overwork and stressful moments. The easiest way to cope with hard times is to skip retrospectives, plannings, backlogs and even daily standups. We have made those mistakes and the consequences are devastating for the team and for the company. Is like going to the gym everyday for 3 weeks and then skip a week because you have important things to do… mistake! you will never go back.

Lesson Learned

Even if your services are crushed the team must keep with the normality in order to build a habit around a methodology. Discipline is the key of a successful development team. Habits only get internalized by discipline.

2- Integration tests are the key to sleep well at night

Mistake

Over the past two years we have done thousands of unit tests, but until we experimented problems with the deployments we did not decide to take integration tests seriously. As we already know unit test can prevent stupid mistakes, they do the code less buggy and they can save your code from refactors bad implemented but what really makes your code stable is the integration/system tests.

Lesson learned

After some problems with the deployments, some calls at night and, to be honest, some angry clients we created a plan to improve our integration tests. Before going on production make sure that you have all your crucial functionality cover by integration tests. Some parts (back end processes) can be very difficult to be tested but the amount of time and headaches saved just because a test fails right before going to production pays the difficulty.

3- Load Testing is the most important thing before going online

Mistake

This can be the most important lesson we have ever learned. When we launched the new FXStreet website, we did load testing, stressed the application with 1000 to 3000 users navigating simultaneously on all the sections of the site. We made a HUGE mistake, we thought that 30' minutes testing batches would be enough.. well it wasn’t we went online and after 8 hours the site crashed.

Lesson learned

Right now Load Testing is a very easy task, at least in the Microsoft world, you can setup a stress test over your application simulating thousands of users at the same time in just a few minutes, so take advantage of it. Test, test and retest, analyse your results using NewRelic or other monitoring tool and when you are 95% (not less) sure that the application will handle spikes of traffic you can go live. It is always better wait a week more than go online and crash in front of your customers.

4- API’s are forever

Mistake

First thing we did when we landed at the company was to rebuild the APIs that we had at that time in production, the successful Calendar and Post APIs, both services were consumed by hundreds of clients. The first thing learned the hard way was to discover that once customers started building their applications and systems using our APIs, changing those APIs becomes impossible. After some deployments in production and complains of the clients (you will never know in what fancy ways they can consume your APIs) we learned that minimum change on the responses causes problems.

Lesson learned

First thing you need to do is build a set of Integration Tests to ensure that responses do not change a thing when you refactor your code. Then you are good to go with any change you want on the App and Infrastructure layer.

5- Design your system around a Continuous Integration that works

Mistake

Our first architectural decisions was to break the monolithic system that had been around at FXStreet for ages into “the new kid on the block”, microservices. From our perspective this was crucial because they provide both a high level separation of concerns and decouple dependencies between systems. In order to successfully implement this approach you need a Continuous Integration infrastructure from the very beginning, our mistake was to choose the wrong tool given our experience and team work capability. We spent hundred of hours trying to fix and configure well the CI tool. After lot of frustration, you can see my previous post about that journey we chose a CI tool that was way easier and maintainable than the previous one.

Lesson learned

Choose the right tool based on the resources and knowledge that you have. TeamCity, Jenkins and other CI Software can be extremely customized but without enough time, resources and knowledge, it can be turned into a nightmare. For those in the Microsoft environment Visual Studio Online CI works perfectly without a lot of effort.

--

--