Recovering from ransomware: One organisation’s inside story
On Sunday 21 February 2021, Manutan, a large office equipment distributor, discovered that two-thirds of its 1,200 servers had succumbed to a cyber attack by the DoppelPaymer ransomware crew.
Commercial activity at the France-headquartered company – which has 25 subsidiaries spread across Europe – would be frozen for 10 days and did not resume fully until May. This has now led to a total overhaul of its IT systems, which started in September and is set to take 18 months.
Manutan cannot reveal the scale of the economic losses it suffered in the cyber attack, and when asked that exact question, Jérôme Marchandiau, the group’s director of IT operations, says that the more profound impact was on the employees themselves.
“The psychological impact is the most terrible thing – nothing works,” he says. “It is like there has been a fire, but without the destruction of any physical property. It’s incomprehensible – 2,400 people found themselves unable to work overnight, not knowing when they will return to work, or even if they will return.
“When you come under such an attack from an invisible enemy, you are crippled by the thought of not knowing where the next blow will come from, so you don’t really tell anybody. You just ask your staff to wait,” he says.
We meet at the beginning of November in a restaurant in suburban Paris, far from the city’s bustle. Marchandiau believes that now the time has come to tell the story of how Microsoft dropped the ball, how Rubrik stepped in to save its bacon, how and why Manutan decided to resist the gang’s blackmail attempt, and how it had to surgically rebuild all of its IT.
Gathering dread
“It was a Sunday morning,” says Marchandiau. “The general service manager called me at 8:30 am because the badge systems were no longer working. It was an incident that could prevent access to our facilities, but not until Monday. Fine, so I waited to see how the situation evolved.
“At 10 am, I got another phone call from a developer, who called to tell me about something weird – he could no longer access a particular server. That’s strange. 10 minutes later, another developer called with a similar issue. That’s when I thought to myself that something was going on.”
Marchandiau phoned his systems manager, who looked first at the input-output curve on the server racks, then at the backup system logs. They got ‘lucky’ – something was wrong with both access to machine bays and backup system alerts.
“We therefore launched surveys on our machines. This took until Monday morning. Gradually, we realised that all of our Windows servers were crypto-locked, and with them the Pure Storage arrays they were using. Of our 1,200 servers, only the 400 Linux and Unix servers were intact, as well as the few very old servers that still ran Windows 2000 and 2003.”
At that point, only Manutan’s e-commerce websites were still up and running. “We could continue to sell, but we couldn’t cash the orders since the applications were no longer responding,” says Marchandiau. “In this case, their servers did only one thing: display a blue screen. Above was written a ransom demand – we did not know how much – and how to contact the criminals.”
Marchandiau and his systems manager got to work shutting everything down to prevent the malware from spreading further. In addition to the remaining unaffected servers, they switched off network gateways and disabled communications with subsidiaries, but the damage had been done.
“How ironic! We have two datacentres for redundancy, but in this situation it was pointless; since they worked together, their synchronisation had only served to contaminate each other,” he adds.
Dropped by Microsoft
A crisis team was swiftly stood up, amid a growing air of paranoia. What if the attack came from within? What if an employee made things worse through clumsiness? They decided that until further notice nobody can touch the information systems, and to inform staff only of the actions taken and the extent of the damage, with the exception of the executive committee, Marchandiau, and the security manager. The most urgent priority is to call for help.
“Early on Monday, we called Microsoft because we had a Premium support contract with them.” Marchandiau visibly tenses. “They weren’t up to the job!”
“It was February 22nd and we found that our support contract was ending on the 28th. Their priority was that we renew the contract before they would come to our aid,” he says, pale with rage.
And the help Microsoft was offering was appalling – it indicated it could intervene in a fortnight, or maybe three weeks, and presented a quote to investigate the source of the attack.
“We protested [but] they told us that none of this would have happened if we had updated their systems regularly,” says Marchandiau. “But with them, the updates come every week! How do they want us to update 800 servers ever week? The effort needed is colossal – it is completely inconsistent!”
Don’t pay, play dead
Meanwhile, Manutan’s insurance provider had been contacted and directed the firm to a service provider specialising in cyber incident response and support. They came back within the hour, and for a price three times lower than Microsoft’s.
“They provided us with a battle plan, the elements of communication, and the procedures to notify the CNIL [Commission Nationale de l’Informatique et des Libertés, the French data protection authority],” says Marchandiau.
The service provider did not take long to get to the bottom of things. It quickly learned the attack began three months ago, via a phishing email, although the identity of the staffer who was tricked into clicking the tainted link was kept secret, and not even communicated to Marchandiau.
As is common in ransomware attacks, the initial phish was used to deploy a bot and establish persistence. The ransomware gang subsequently returned, apparently somewhat by chance, rooted around in Manutan’s network, and determined the target was interesting to them.
“Our Windows machines were protected by Microsoft Advanced Threat Protection, Windows Defender, and BluVector Cortex. These tools had correctly generated alerts. But, at the time, we were coming out of a security audit.
“So, you know how it goes: we went on a three-year plan to secure what needed to be secured, that was a matter of course. In short, we had the means to identify the attack, but no longer the necessary vigilance,” says Marchandiau.
As far as the ransom demand itself was concerned, the service provider warned that it was important Manutan not respond, even more so that it not pay. In the case of this particular gang, as soon as the victim shows up to negotiate, the criminals activate a three-week timer at the end of which – if there is no resolution – they make good on a series of threats, disclosing the victim’s sensitive information and irreparably destroying the data.
Therefore, to pretend that Manutan had not yet realised it had been attacked – in effect, to play dead – would serve to buy it valuable time. In terms of actually paying, this could make the gang ask for more and would not provide any guarantee that the data would be recovered.
“We spent time determining what data they had recovered and the risk it posed. We concluded that it was not critical – for example, they did not access our contracts with suppliers. Then we evaluated our ability to put a functioning IT system back together, which we could do, and we decided that we would not pay,” says Marchandiau.
An emergency exit, courtesy of Rubrik
In an emergency, the reconstruction of a functioning IT system relies on backups, of which Manutan had deployed three – Rubrik for its servers, Veeam for Exchange, and NetBackup for databases.
“None of the backups were encrypted, but on the other hand, the Veeam and NetBackup data was unusable because the servers were running Windows. Only Rubrik, with its standalone appliances, was usable. But it meant we were therefore able to restore all our Windows servers,” says Marchandiau.
The Rubrik solution uses an appliance that is intended to be impenetrable and its backups are said to be immutable – which is to say that they are locked by the appliance’s firmware to exclude the possibility of external unlocking, either by an IT admin or the manufacturer itself, until a predefined expiry date at which point the backup becomes writable again. Ultimately, this backup is erased, since enough time has passed for more backups to be made and locked in turn.
“We even thought about protecting the clock of the host system, in order to prevent criminals from cheating by changing the current date to the expiration date,” says Pierre-François Guglielmi, Rubrik’s technical director, who tells LeMagIT that the appliance’s protection is tested three times daily at Rubrik’s headquarters. Additionally, Rubrik invites ethical hackers to test the solution with their latest attack techniques on a monthly basis.
Marchandiau recalls that Manutan had invested in Rubrik following a bake-off with Commvault for several reasons, ranging from autonomy to ease of administration and, above all, the system’s ability to navigate through the backed-up data without having to restore it.
“It was this feature that allowed us to know how far to go up through backups to restore healthy server systems and data from any infection. It’s quite simple: you go through the directories as if they were already restored and see if the files have strange names, which is a characteristic of crypto-lockers,” he explains.
Backtracking to images of the systems saved three months previously proved sufficient. But in case Manutan had needed more recent data, the service provider offered it some tools to attempt to decipher what exactly had been affected by the ransomware.
“We were surprised to see that these tools worked quite well for retrieving documents from file servers. However, they were unusable for databases,” says Marchandiau. Such tools can, incidentally, be downloaded free of charge from the likes of Avast and Kaspersky.
10 days and nights, non-stop work
The operational restoration of the servers was a long-term job, going one by one, server by server. Before pushing them live, Manutan procured and installed a new endpoint detection and response (EDR) solution on them to scan the disks for any known malware.
“Our insurer strongly encouraged us to take a solution that we had not used before, in this case SentinelOne’s EDR,” says Marchandiau. “So for each of the 800 corrupt servers, it worked like this: we restore it, we start it, we verify that it works, we install SentinelOne on it, we scan it completely, then we connect it to the network and we pass to the next server.
“There were only three of us to do this work. After 10 days and 10 nights, we had put 80% of the servers back into service. The remaining 20%, the most complex, still took us almost three months.”
The first to be restored was the Active Directory (AD) server, upon which all Manutan’s other systems are identified. “That’s right,” says Marchandiau, “Our Active Directory was carrying that weight for years [and] it would have to be cleaned up to make sure it was free of backdoors. But, obviously, we didn’t have the time so we bet that protecting the EDR and cutting off communications with other subsidiaries would guarantee sufficient protection to relaunch our IT as soon as possible.”
But Marchandiau quickly discovered another problem – the EDR was slowing down applications, considerably.
“The solution was to manually clean up all of our servers to remove structural weaknesses and relieve the burden on the EDR. But, once again, embarking on this adventure would have cost us infinite time. We have decided to postpone this work,” he says.
But things were not done yet. All the freshly restored servers were fully backed up again, but these backups did not overwrite the previous ones, since they are immutable. “Our backup appliances were all suddenly full! I had to call Rubrik for more help with cleaning up,” says Marchandiau.
LeMagIT was not able to obtain a detailed description of how Rubrik rid Manutan of its old backups, but it seems likely the supplier may have had no choice but to bring in new blank hard drives. Whether these replaced the previous ones, or were added to them, seems to be a free service which perhaps understandably the supplier does not much care to advertise.
Four lessons learned
The IT systems could now be restarted, but overall, Manutan no longer judged its prior approach to IT to be sustainable going forward.
“Obviously, we had four weaknesses,” says Marchandiau. “The first is that 75% of our applications and databases are based on Microsoft. We need to reduce that attack surface. Ultimately, we will no longer have SQL Server, we will probably replace them with PostgreSQL or MariaDB.
“The second lesson is that all of our servers share their network storage with each other, so that their applications can exchange data. This creates gateways which are vectors of propagation. From now on, there will be a break in the protocols between the applications. We will go through an intermediation platform: a single server that shares and filters files for everyone, as well as an ESB that converts the data. This is the most complicated part to put in place.”
“Third, our applications themselves are not secure. We have to rebuild them, rewrite them to be secure by design,” he says, although he does not specify whether that means making them executable by Linux servers too.
Finally, the decision has been taken to drop Veeam and NetBackup. “It is not so much these solutions in themselves that are problemati,c [but] above all that faced with an incident such as the one we have experienced, we can no longer afford to have three different restoration methods, this adds too much complexity, ” he explains, referring to the fact that Veeam and NetBackup appliances had to be restored from Rubrik before they could restore the data they themselves held.
This content migration off Veeam and NetBackup to Rubrik will take time, and Manutan estimates that it will take until the end of 2021 before it can say it only has Rubrik backups in service.
Rebuild everything from scratch
But applying all the changes Manutan wants to make is simply not possible with the existing IT systems, and therefore – with its service provider’s assistance – it is starting from scratch with brand new infrastructure. In September, the organisation therefore embarked on this programme to completely overhaul its IT systems – called Horizon – which will last until the first quarter of 2023.
The Horizon programme involves the complete replacement of the infrastructure in a new datacentre, around a new network core, with new access methodologies. Nutanix clusters will replace VMware clusters. There will be new security policies, no more Windows dating from before 2016, and no Red Hat Linux dating from before version 7.9. Moreover, the server images will not be migrated but rather replaced by others, built from scratch from virgin systems. The only thing to be recovered from the old datacentres will be the data itself, via a decontamination service, naturally.
Ultimately, everything will be monitored by Splunk’s security information and event management (SIEM) service, which will continuously probe all network traffic and server activity, with the monitoring entrusted to a service provider’s managed security operations centre (SOC) team, who will monitor goings on and take the necessary actions in the event of the slightest alert.
“This overhaul will cost us several million euros, which is much more expensive than if we had paid the requested ransom,” says Marchandiau. He doesn’t regret this though, and for good reason – even if Manutan had paid a ransom, the overhaul would still have been inevitable.
A clear-cut, responsive attitude
In its reconstruction plan, Manutan could have chosen to switch everything to the cloud but did not – why is this?
“We are quite fond of controlling our own infrastructure,” says Marchandiau. “When we announced we had been the victim of ransomware, the cloud providers we worked with cut us off and explained they would reactivate our services when they found out exactly what had happened to us. They are protecting themselves. I want to protect us too; we do not want to depend on their flaws any more.”
Moreover, he reveals, no Manutan server has direct access to the cloud any more, and all exchanges must now go through a web proxy. “We sincerely believe that our next incident will begin through a partner. We collaborate with startups that are as inventive about cloud applications as they are fallible about their infrastructures,” says Marchandiau.
On the human level, the team in charge of the new infrastructure is being reinforced with a dedicated architect, and cyber security awareness campaigns have been implemented, with exercises now carried out every fortnight across the organisation.
“Since these events, we have had new alerts, new phishing attempts. And it is clear that everyone was very responsive,” says Marchandiau, who is planning to start an intrusion test campaign as soon as he gets back from lunch.
The original version of this article by Yann Serra can be read at LeMagIT, Computer Weekly’s sister title. This version was translated from the original French by security editor Alex Scroxton.
For all the latest Technology News Click Here
For the latest news and updates, follow us on Google News.