Protection against DDoS attacks: awax, scrupbing-center, rate limiting and automation of response

Depov

Moderator
Staff member
MODERATOR
ULTIMATE
SUPREME
PREMIUM
MEMBER
Joined
Feb 18, 2025
Messages
144
Reaction score
164
Deposit
0$
DDoS-attack business logic: TTPs on MITRE AT&CK and what SOC should see
Before building a DDoS-resistant infrastructure, let’s see how the attack looks through the TTPs prism – without that, the correlation rules and the detection will be curved.

The preparatory phase. The attacker assembles IP addresses of the target infrastructure - IP Addresses (T1590.005, Reconnaissance) - and forms or leases the botnet - Botnet (T1583.005, Resource Development) According to Terrzone, Aisuru botnets in 2025 numbered from 1 to 4 million infected devices, and DDoS-for-hire platforms reduced the entry threshold to several hundred dollars and a Telegram account. Ordering a DDoS is easier now than setting up protection from it.

Shock phase - Impact according to the MITRE ATT&CK classification. Each type requires its own layer of protection, and to confuse them is a typical error:
• Direct Network Flood (T1498.001) )- volumetric: UDP flood, ICMP flood. The goal is to score a channel. According to Terrazone, the volume reached 31.4 Tbps at the end of 2025.
• Reflection Amplification (T1498.002) )- attacks via DNS, NTP, Memcached with a gain. The source is open recursive DNS servers and misconfigured Memcached, which is still enough on the Internet.
• Service Exhaustion Flood (T1499.002) - SYN flood, exhaustion of status tables on the high state firewall and balancers. It's not the canal that's clogged, it's the connection tracking that's killed.
• Application Exhaustion Flood (T1499.003) )- HTTP flood, Slowloris, API abuse. Each request looks legitimate, but devours a disproportionate to a lot of backend resources.
Multi-vector is the norm, not an exception. According to StormWall, the share of multi-vector attacks in 2025 increased from 25% to 52%. Terrzone captures 58% of DDoS attacks with two or more vectors that sequentially switch between TCP carpet-bombing, UDP flood, DNS amplification and SYN floods – sometimes in three minutes. The average power of the attack, according to StormWall, increased by 63% - from 71 Gbps to 116 Gbps.

Attacking not fools: while one vector distracts NOC, the second punches the other layer.

DDoS as a distraction - a red flag for SOC. While everyone is clearing incoming traffic, attackers can carry data or fixed through compromised hosts inside the perimeter. Abnormal outgoing traffic from internal servers during DDoS is a signal that must be worked out in parallel with the mitigation, not after. Po OWASP A09:2021 (Security Logging and Monitoring Failures), organizations in which logging degrades under the load of DDoS, miss this vector. I've seen this in every third analysis.

Financial impact is specific figures. According to Terrazone, the average cost of DDoS indictment is $52 000 for SMB and $444 000 for the enterprise. And this is without reputational damage and regulatory fines.
DDoS-Stable infrastructure: awaxing, scrupbbing and hybrid architecture
BGP Anycast: geometric advantage vs. volume
In a nowaycast-topology, the same IP prefix is announced with several Points of Presence (PoP) at the same time. Incoming traffic – both legitimate and attacking – is routed to the nearest PoP via the BGP shortest-path selection.

The advantage here is geometric: the volume of the attack is divided into N points. According to the Hartdos, the attack of 2 Tbps on an othercast network of 30 PoP lands about 67 Gbit / s on each point - within the scrubbing capacity of one platform. Roughly speaking, the more PoP - the less each gets.

Latency anycast is determined by the inline-scrubbing with the PoP. On ASIC-accelerated platforms - 1-4 ms. At the beginning of the attack, PoP filters without BGP-changes, without tunnel, without a convergence wait. Net traffic comes out of the closest PoP to the user.

Limitation: PoP density. A 5 PoP network is an argument about the capacity, but not about latency. According to the Haltdos, an anycast provider with less than 15 PoPs does not give a real distribution of the attack. When evaluating the provider, see the PoP quantity in your area and the type of scrupbing - ASIC-based (harware-accelerated) or software-based (with a significantly lower bandwidth of per-PoP). The difference between them is order.

Limitation: L7 attacks. Anycast-PoP on L3/L4-filtration with sampled NetFlow (sampling 1:1000 or 1:2000) on 10 Gbps interfaces has a statistical floor detection. Low-bandid HTTP floods on sub-Gbps level go unnoticed 60-120 seconds until the flow-note density reaches the threshold. For L7 attacks, WAF integration is needed - anycast is blind here.
Traffic Cleaning Center: Operating Principle and Hidden Price of BGP sabotage
In the on-demand scrupbing model, traffic in normal mode goes directly to origin. If an attack is detected - by the telemetry of the client or the monitoring of the provider - BGP-annsions are modified, all incoming traffic is redirected to the scrupbing center. Purified traffic is tunneled back to origin via GRE or MPLS.

The capacity here is real: one scrupbing center pulls 10-20 Tbbit / s clean pass. For hyper-volume attacks aimed at one prefix, this is critical.

But the hidden price is latency spike at the time of activation. BGP-convergence is a propagation of new announcements, the convergence of the global table, the establishment of a GRE tunnel, the stabilization of routes. According to modeling Harddos for enterprise scenario (attack 10 Gbit / s, origin in one region, scrupbing in the neighboring, tiier-2 BG transit), full detection + diversion + culling costs 3-10 minutes. For real-time applications (VoIP, trading platforms, payment gateways) with the requirement of RTT <20 ms - latency SLA is disrupted even after the start of the mitigation, in the stabilization window.

Those 6 minutes from postmortem at the beginning - this is the price.
Decision matrix: anycast vs scrupbing vs hybrid
Mature production installations glare both approaches, distributing responsibility to attack classes.

1782682250995.png


Hybrid architecture in practice:
• Anycast PoP - inline ASIC filtering volumetric L3/L4 attacks. No BGP-changes, without latency spike.
• Scrubbing center - in reserve for hyper-volume events exceeding per-PoP capacity. Activation through pre-configured BGP communities (not reactive flow-triggered sabotage) - convergence time less than 60 seconds.
• WAF - closes L7: HTTP flood, API abuse, Slowloris - everything below volumetric detection threshold.
• GSLB - health-check and traffic redirection from PoP/regions with residual degradation.
Questions to the provider before signing the contract (on the recommendation of the Haltdos): what time is BGP-convergence from the nearest PoP to your region (not global average)? What sampling rate on 10 Gbps/s+ interfaces and a minimum attack band detectable in 30 seconds? How much poP have ASIC-based scrupbbing, not software-based? If the provider cannot answer specific figures, this is already the answer.
Rate limiting protects the applied level when volumetric filtering has already worked out. The wrong threshold breaks legitimate users before stopping the attack – I’ve seen the company dodging itself by setting up a DoS by setting a limit of 5 r/s on the endpoint through which the mobile client went.

Sliding window vs fixed window. Fixed window (100 requests per minute) creates boundary burst: the attacker sends 100 queries at the last second of one window and 100 in the first second of the next - 200 queries in 2 seconds. Sliding window through Redis sorted sets eliminates this hole:
def check_rate_limit(client_id, limit=100, window=60):
r = redis.Redis()
now = time.time()
key = f"rate:{client_id}"
pipe = r.pipeline()
pipe.zremrangebyscore(key, 0, now - window)
pipe.zadd(key, {str(now): now})
pipe.zcard(key)
pipe.expire(key, window)
return pipe.execute()[2] <= limit
differentiated limits. Login, reporting generation, an API with heavy DB-quaries receive a limit of 5-10 times lower than a read operation. In nginx: limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s c burst=20 nodelay - 10 requests per second with short-term burst up to 20. Authenticized users receive an increased limit through a separate zone.

Response to excess: HTTP 429 with headers Retry-After, X-RateLimit-Remaining, X-RateLimit-Reset. Correctly written customers (and advanced botnets with human-like behavior) will follow Retry-After, reducing the load.

Limitation: limiter at the application level (nginx, HAProxy, Kubernetes Ingress) works only if the volumetric attack is already filtered on edge. At 100 Gbps UDP flood, your nginx will not see any HTTP request - the channel is clogged to it. Rate limiting is the last line, not the first. Without edge filtering, relying on it is pointless.
Automatic reflection of DDoS: from detection to blackhole in seconds
Manual activation scrubbing is the 6 minutes from the beginning of the article. The engineer saw an altrate, climbed into wiki, scored a team, made a mistake, reassessed. Automation reduces response time to seconds, but requires calibration: false positive leads to self-defigant DoS.

Fastnetmon - open-source DDoS anomalies detector by NetFlow/sFlow/IPFIX (last release - actively supported, GitHub 3.5k+ asterisk). Calculates baseline (packets per second, bits per sweat, streams per second) for each /32 and when the threshold is exceeded by N times (usually 3-5x) it triggers action. Requirements: GNU/Linux (Ubuntu 22.04+/Debian 12+), 4+ CPU core cores, 8 GB RAM, switches with NetFlow v9 or sFlow support.

Action options when triggered:
• RTBH (Remotely Triggered Black Hole) - the announcement of the attacked /32 through the BGP community with next-hop blackhole on a upstream route. On Juniper: routing-options static route X.X.X.X/32 discard, on Cisco: ip route X.X.X.X 255.255.255.255 Null0. IP becomes unavailable, but the rest of the infrastructure lives. A tough option is when IP can be temporarily donated.
• BGP FlowSpec - granular filtration: drop only UDP on port 53 with a certain source, leaving TCP. It is not supported by all upstream providers - check in advance, not at the time of the attack.
• Webhook in the scrubbing API - automatic redirect activation through the provider's API, without manual intervention.
Ansible + Terraform for cascade automation. When fastnetmonn Ansible-playbooks is activated by ACL on border routers (drop specific source ranges, rate-limit specific protocols). Terraform raises additional HAProxy cloud-instorms or scales WAF nodes. Po NIST CSF RS.AN-01. Each automation triggers generates an SOC instruction for manual verification - automation does not replace a person, but wins him time.

Prometheus alerting - early warning before the main automation is triggered:
YAML:

- alert: DDoSTrafficAnomaly
expr: >
rate(node_network_receive_bytes_total[5m])
> 2 * avg_over_time(
node_network_receive_bytes_total[7d])
for: 2m
labels:
severity: warning
annotations:
summary: "Входящий трафик >2x baseline"
Alert on 2x from baseline, error rate >5%, 4xx > 20% - a signal for automatic tightening of the rate limiting or activating the challenge page. The threshold of 2x is for early warning; Fastnetmon with a threshold of 3-5x is already triggered for active mitigation.
 
Top Bottom