DRP Organisation
DRP (Disaster Recovery Plan) is the exercise performed by the IT teams to validate the resilience of the Bank IT asset.
Context
​
A Bank is a critical pillar of a country's economy and its IT is at the foundation of this pillar.
​
In case of major incident, IT has to must be ready to perform fix actions in the best delays . Banks, at least French ones, have the legal obligation to test their IT resilience at least 1 time by datacenter they own.
It consists in stopping a part of the infrastructure (example : datacenter power down) to simulate a real Infra unavailability. It permits to see if the critical services can to stay available despite the crash that occured (for exemple switching from a stopped datacenter to another available one).
​
It usually occurs during a week-end and takes place in two steps:
- FailOver : Intentional stop of the infrastructure we wish to test.
- FailBack : Return to the normal situation by restarting all entities precedently stopped.
​
Working in the Middleware/Container team, I have been responsible of the biggest Datacenter DRP organisation in all the Middleware team (for SGCIB) before, during, and after the event.
​
​


​
The Project
​
Before
I had to prepare the DRP for my team:
- Analyse and create and up2date inventory of the impacted assets.
- Communicate with the control tower, and with the differents Middleware team
- Take care of the logistic before the event ( Order of the meal tray, Planning of the Week-end depending on my coworkers’ availibity.
- Check that all DRP tools will be available during the event
​
During
I had to take care of the good proceeding of the event:
- Manage the priorities of the incidents
- Manage my coworkers (Stress, Customers requests managment)
- It means keep more than everything communication with the coworkers to work as one efficient team.
- Communicate with Control tower
​
Here my role was double:
- Take distance from the production (have a global view) of the event. This required to manage the event as a project and be the Team work pillar.
- Help on what needs to be fixed regarding the issue priority. This required to have a good understanding of the production concerpt and a good knowledge of the assets we manage.
​
After
After this event I had to create Incidents Post Mortems if needed to keep a trace of it and explain how it had been definitely solved.
​
A great Project I participed to for sure!
I learnt a lot during this project and I had the opportunity to manage others since then.