tutoriales · 10 min read
Arup: $25M via deepfake CFO on a video call (Hong Kong, February 2024)
Fifteen transfers, five Hong Kong accounts, $25M gone in a single Zoom session. The finance employee approved the transaction because "the CFO and other colleagues were on the call" — every one of them was a real-time video and voice deepfake. The canonical case of AI-assisted social engineering.
· Manuel López Pérez · tutoriales

In late January 2024, a finance employee at Arup in Hong Kong — the British consultancy behind the Sydney Opera House — gets an email from the CFO in London asking for a confidential transaction. The employee suspects phishing; the message has the classic business email compromise shape. He raises the concern. The reply invites him to a video call with the CFO and other executives to confirm.
The employee joins. He recognises the CFO on screen. He recognises the other executives. The voices match what he remembers. The doubt fades. After the call he runs 15 transfers totalling 200 million Hong Kong dollars — around $25.6 million — to 5 bank accounts the “CFO” specified. It takes several days before London questions the transaction. By the time Arup contacts the Hong Kong police, the money has already moved.
The police publish the figure in February 2024 without naming the victim. Arup confirms it is the firm involved on 16 May 2024, in a statement to CNN. It is the canonical piece of AI-assisted social engineering of the year.
Lab: the full attack is not reproduced against third parties. What can be done — and is done below — is build the pipeline against yourself (your own audio and video, consumer GPU) to understand latencies, quality and detectable artefacts; then run it through a detector and measure TP/FP rate. That is ethical, replicable lab work.
Why this case matters
Until late 2023, the public corpus of high-quality deepfakes served two use cases: entertainment and targeted harassment. The recurring line in awareness presentations was “we haven’t seen a live deepfake BEC at this scale yet”. Arup retires that line in a single operation.
The combination that runs the attack:
- Prior reconnaissance — LinkedIn and public statements from the real CFO to profile voice, mannerisms, register, direct contacts in the finance hierarchy.
- Email pretexting — first touch with the classic “urgent confidential transfer” pattern so that the victim is already inside the script when the video call arrives.
- Live deepfake, not pre-recorded — the CFO and the other executives are on the call answering in real time. The victim can ask questions, the executives answer coherently. The psychological barrier of “if it responds to me, it’s real” breaks.
- Multiple simulated participants — it is not one-to-one. Several executives back the decision. Group pressure works for the attacker.
The detail that holds the attack up is not the technical quality of the deepfake — 2024 deepfakes still have detectable artefacts if you know the format. It is that the victim was not looking for artefacts. The finance employee has an operational task, not a forensic validation. When the CFO on screen approves, the cognitive flow is “confirmed, execute”, not “verify whether this is a deepfake”.
The chain reconstructed
- Public collection (weeks to months earlier).
- LinkedIn + conferences + interviews → voice model of the real CFO (Rob Greig at the time) trainable on a couple of minutes of clean audio. Open-source: ElevenLabs, Tortoise TTS, F5-TTS, Coqui XTTS-v2.
- Public video + conference talks → face model for real-time face-swap. Open-source: DeepFaceLab, SimSwap, Live Portrait. Inference on consumer GPU (RTX 4090) at 30 FPS.
- Pretext email (day N).
- Classic BEC pattern, language matching the real CFO. Likely LLM-assisted for tone and internal terminology.
- Confidential transaction request → psychological trigger the employee will question. It is desirable that he questions it, because the answer is the video call.
- Video call with live deepfakes (day N+1 or N+2).
- Multiple simulated participants: CFO + 2-3 executives. Likely human actors behind each deepfake, not autonomous AI agents (the case is January 2024 — Operator and Computer Use were not yet public).
- Real-time audio with voice cloning. Video with face-swap.
- Responses coherent with the role each deepfake is playing.
- Execution (following days).
- 15 transfers to 5 accounts. Splitting below per-transaction internal approval limits. Hong Kong accounts, a jurisdiction with fast layering to other destinations.
- Detection (days to weeks later).
- Communication between Hong Kong and London surfaces the discrepancy. The real CFO never requested the transaction.
- Arup notifies the Hong Kong police in January. Public disclosure in February (unnamed). Arup named in May.
What did not work (and what would have)
What failed in Arup’s controls, according to a public statement from the firm’s own CIO:
- Verification based only on a communication channel the attacker controls. Email and video call are both manipulable channels. There was no verification through an alternative, uncontrolled channel.
- No dual authorisation for transfers of that size. $25M ran with the Hong Kong employee’s approval, no additional sign-off in London.
- Acceptable splitting. 15 transfers below the enhanced review threshold — the system only questioned single transfers above a certain value.
- Transitive trust in the deepfake. The employee approved because “the CFO confirmed on video”. “Seeing is believing” remains the cognitive default, and the technical control has to assume that channel is not reliable.
What would have stopped the attack, in decreasing order of cost:
- Mandatory out-of-band verification for transactions above a threshold. Call the real CFO at the personal phone number listed in the internal directory (not the one from the video call), or message his real Slack/Teams account, before executing. Costs minutes. Would have caught the attack at step 3.
- Dual authorisation with geographic segregation. Any transfer >$1M requires sign-off from a second approver in another jurisdiction. Standard in banking; not operationally enforced at Arup.
- Time delay for bulk transfers to new accounts. 24-48h hold with auto-cancel if the transaction is not confirmed through an alternative channel. Would have stopped all 15 transfers.
- Shared codeword or passphrase for verbal authorisation on video. The real CFO and the finance team carry a code rotated monthly. The deepfake does not know it. Trivial to implement, uncommon in practice.
- Technical awareness banner for corporate clients. “If someone asks for a transfer on a video call, assume deepfake until out-of-band verification.” Costs zero. Changes the cognitive default.
Lab: offensive pipeline and detection, against yourself
The reproducible stack to understand the real cost of the attack (all open-source, consumer GPU, ~1 day of work):
# 1) Voice cloning — XTTS-v2 (Coqui), 6 seconds of clean audio
git clone https://github.com/coqui-ai/TTS && cd TTS
pip install -e .
tts --text "I need you to run the transfer today" \
--model_name tts_models/multilingual/multi-dataset/xtts_v2 \
--speaker_wav samples/my_voice_6s.wav --language_idx en \
--out_path clone.wav
# Inference latency: 200-400 ms/sentence on RTX 4090
# Quality: natural prosody; artefacts detectable only on sibilants and breath
# 2) Real-time face-swap — DeepFaceLive (continuation of DeepFaceLab)
git clone https://github.com/iperov/DeepFaceLive
# Pre-trained DFM model (200K iter on target face, ~6h on RTX 4090)
# Inference: 25-30 FPS @ 720p on RTX 4090, 10-15 FPS on RTX 3060
# 3) Inject the stream into a video call (OBS Virtual Camera)
sudo apt install obs-studio v4l2loopback-dkms
sudo modprobe v4l2loopback devices=1 video_nr=10 \
card_label="Virtual Camera" exclusive_caps=1
# OBS → Source: DeepFaceLive → Start Virtual Camera → selectable in Zoom/Teams
# 4) Audio: route XTTS-v2 to the virtual mic via pipewire
pw-loopback --capture-props='node.name=tts-out' \
--playback-props='node.name=mic-virtual'Total cost: ~$2000 in hardware, cloud GPU budget around $5-10 for a full session. Technical access to the attack is not the barrier.
Detection — what fails in real time
Systematic artefacts that persist even in 2024-2026 pipelines:
| Signal | How it is measured | Expected TP |
|---|---|---|
| Blink rate | OpenCV + Dlib landmarks → 8-12 blinks/min in humans, ~3-5 in DFL deepfake | 60-70 % |
| Facial PPG (heartbeat colour) | Luminance variation on forehead/cheeks at 1 Hz (Intel FakeCatcher) | 90+ % in lab |
| Lip-sync drift | Audio2Lip cross-correlation; voice-cloned + face-swap go out of sync | 70-80 % |
| Saccadic micro-movements | Eye tracking — humans do ~3 saccades/s; a rendered face looks “too still” | 50-60 % |
| Head pose vs neck transition | Edge limits where the face-swap mask meets the neck — flicker over 1-2 frames | 40-50 % at 720p |
Minimal detection implementation with OpenCV + mediapipe that any security team can have running over recordings of critical video calls:
# detect_deepfake_blink.py — defensive PoC
import cv2, mediapipe as mp, numpy as np
mp_face = mp.solutions.face_mesh
face = mp_face.FaceMesh(refine_landmarks=True)
# EAR (Eye Aspect Ratio) indices on mediapipe landmarks
LEFT_EYE = [33, 160, 158, 133, 153, 144]
RIGHT_EYE = [362, 385, 387, 263, 373, 380]
def ear(landmarks, idx):
p = np.array([(landmarks[i].x, landmarks[i].y) for i in idx])
return (np.linalg.norm(p[1]-p[5]) + np.linalg.norm(p[2]-p[4])) \
/ (2.0 * np.linalg.norm(p[0]-p[3]))
cap = cv2.VideoCapture('callrecording.mp4')
blinks, prev_ear = 0, 1.0
fps = cap.get(cv2.CAP_PROP_FPS); frames = 0
while True:
ok, frame = cap.read()
if not ok: break
frames += 1
res = face.process(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
if not res.multi_face_landmarks: continue
lm = res.multi_face_landmarks[0].landmark
e = (ear(lm, LEFT_EYE) + ear(lm, RIGHT_EYE)) / 2
if prev_ear > 0.21 and e < 0.21: # eye open → closed transition
blinks += 1
prev_ear = e
duration_min = frames / fps / 60
rate = blinks / max(duration_min, 0.001)
print(f"Blink rate: {rate:.1f}/min ({'SUSPICIOUS' if rate < 6 else 'normal'})")On 5 min of legitimate video of the same person, the typical rate is 10-14/min. On a DFL face-swap of the same speaker, it drops to 3-5/min — a robust gap.
SaaS detection services with documented APIs: Intel FakeCatcher (proprietary, PPG-based), Reality Defender (multi-modal), Sensity AI (image+video), Hive Moderation (frame-level). Latency <10 s on a 30 s clip. Useful as second opinion inside approval flows, not as a single gate.
Operational lessons for 2024+
The Arup case is an industrial-scale proof of concept of three patterns that will keep repeating:
- Tier-1 identity verification no longer scales with an outsourced workforce and a three-question protocol. The Marks & Spencer attack in April 2025 shows the same point in reverse: the human attacker poses as an employee to the helpdesk; in Arup, the deepfake poses as an executive to the employee. The pattern attacker with recognisable voice/face requests a privileged action is broken without out-of-band verification.
- Financial transactions require a chain of trust outside the channel. Any process that depends exclusively on email + voice + video assumes none of the three is manipulable. In 2024 all three are.
- Deepfake awareness as primary threat awareness, not secondary. Awareness training programmes in 2024-2025 move deepfake from the “emerging threats” block to the main block. The Arup case is the justification.
Regulators respond accordingly. EU AI Act Art. 50 (transparency for deepfakes and AI-generated content) enters into application on 2 Feb 2025. FinCEN publishes an advisory specifically on deepfake-enabled fraud in November 2024. The Hong Kong Securities and Futures Commission issues a circular in March 2024 — the month after the Arup case — requiring identity verification controls for significant transfers.
The official report from Arup’s CIO (Rob Greig, statements to CNN/Fortune in May 2024) sums it up in one operational line:
“Like many other businesses around the globe, our operations are subject to regular attacks, including invoice fraud, phishing scams, WhatsApp voice spoofing, and deepfakes. What we have seen is that the number and sophistication of these attacks has been rising sharply in recent months.”
The $25M figure is the headline. The detail is that Arup has a mature security programme, a dedicated department and industry-standard controls. An attacker with a GPU inference budget, public collection and human actors gets past them. The takeaway is not “Arup did something wrong” — it is “industry-standard controls as of late 2023 did not contemplate this vector”, and any team running large transactions from 2024 onwards has to assume so.
References
- CNN Business, Arup revealed as victim of $25 million deepfake scam (16 May 2024): https://www.cnn.com/2024/05/16/tech/arup-deepfake-scam-loss-hong-kong-intl-hnk
- Fortune, A deepfake ‘CFO’ tricked British design firm Arup in $25 million scam (17 May 2024): https://fortune.com/europe/2024/05/17/arup-deepfake-fraud-scam-victim-hong-kong-25-million-cfo/
- CNN, Finance worker pays out $25 million after video call with deepfake CFO (4 February 2024, no attribution to Arup): https://www.cnn.com/2024/02/04/asia/deepfake-cfo-scam-hong-kong-intl-hnk
- World Economic Forum, Cybercrime: Lessons learned from a $25m deepfake attack (February 2025 retrospective): https://www.weforum.org/stories/2025/02/deepfake-ai-cybercrime-arup/
- FinCEN, Deepfake-Enabled Fraud Advisory FIN-2024-Alert004 (13 November 2024).
- Hong Kong SFC, Circular on Deepfake-Enabled Fraud (March 2024).
- Regulation (EU) 2024/1689 Art. 50 — transparency for AI-generated content.
- deepfake
- social-engineering
- business-email-compromise
- bec
- arup
- hong-kong
- cfo-fraud
- ai-fraud
- identity-verification


