Cloud
Azure WAF: frame an emergency custom rule without losing evidence
Apply a temporary Azure WAF custom rule with priority, KQL evidence, business validation and rollback, without permanently hiding managed-rule signals.
An Azure WAF custom rule is often created under pressure: a visible attack, a blocked partner, an exposed administration path, or a managed rule that becomes too noisy during a release. The danger is not only writing the wrong condition. The real risk is adding a broad, high-priority rule and then losing track of which requests it hides, which OWASP signals disappear, and when the rule should be removed.
The use case here is a team that must quickly add a custom rule to an Application Gateway WAF policy. The goal is not to forbid urgent changes. It is to make them operable: clear scope, justified priority, before and after evidence, validity window, rollback threshold and a trace in the runbook.
Start from a precise symptom
A custom rule should answer an observable symptom. “Strengthen the WAF” is too vague. “Temporarily block POST requests to /login from an IP range that generates most failures” is actionable. “Allow all /api/import because a CRS rule bothers one customer” is probably too broad.
Before changing the policy, capture a short intake:
- affected URI, hostname, method, client or country
- observed time window
- visible WAF rules or application errors
- expected decision: block, allow or log
- example request that must match
- example request that must not match
- rollback criterion
This short intake keeps an incident reaction from becoming permanent configuration. It also forces the team to distinguish blocking, allowing and observation.
Understand priority impact
On Application Gateway WAF, custom rules are evaluated before managed rules. A Block rule can stop traffic before OWASP/CRS or DRS rules produce their signals. An overly broad Allow rule can let a request through that would otherwise have been blocked later. Priority is therefore not just visual ordering: it is a diagnostic decision.
Before publication, review five points:
- whether the priority is higher than necessary
- whether the condition limits hostname, path, method or source
- whether Allow is really required
- whether the rule hides a managed rule that is useful for analysis
- whether Log would be enough for a few minutes to measure the scope
A useful reflex is to start with KQL analysis or a logging action when the urgency allows it. If the rule must block immediately, its scope should be stricter than the observed symptom, not broader.
Example: temporarily block attack noise
Imagine an application published behind Application Gateway WAF. During a short window, /login receives abnormal POST volume from a known source range. Legitimate login must remain open for normal users. A custom rule can block that source on that precise path, with documented priority and an operational expiration.
The rule definition should stay explicit:
- policy: wafpol-app-prod
- rule name: block-login-noise-20260605
- action: Block
- priority: 120
- path condition: /login
- method condition: POST
- source condition: the confirmed source range only
This example does not replace a full security analysis. It illustrates a bounded rule: path, method, source and dated name. The dated name is intentional. It makes the rule easier to find during review and reminds the team that it should not remain invisible in the policy.
Prepare KQL evidence before the change
An emergency rule must be observed as soon as it is created. The follow-up query should answer three questions: does the rule match, is the volume consistent with the symptom, and are unexpected paths affected?
The monitoring query should filter the Application Gateway firewall logs over a short window, isolate the temporary rule name, then summarize hits by time bucket, hostname and action. Keep the query in the runbook with the real table name used by the environment.
Depending on ingestion mode, fields can differ between AzureDiagnostics and dedicated tables such as AGWFirewallLogs. The runbook should name the table actually used in the environment, not only the ideal query.
Define rollback before pressing enter
Custom-rule rollback should not be improvised after the first user reports. The team must know how to disable the rule, which signal triggers rollback and who validates the decision.
The rollback procedure should include:
- the exact rule name
- the command or portal path used to disable it
- the signal that triggers rollback
- the person or team allowed to confirm the decision
- the evidence to keep after rollback
The threshold can be simple: increased application errors on /login, blocked expected sources, no reduction in noise, or inability to tie logs back to the rule. A rule that does not produce usable evidence should be reviewed quickly.
Do not turn urgency into permanent debt
After stabilization, three options exist. The rule can be removed if the traffic disappears. It can become a permanent control if it expresses a real publication policy. It can also be replaced by a targeted exclusion if the original problem was a managed-rule false positive.
Review the rule after 24 or 48 hours:
- does the rule still have hits?
- do hits match the expected scope?
- are managed rules still useful and visible?
- is the Block, Allow or Log action still justified?
- should the rule be removed, documented or migrated to IaC?
This review matters for policy readability. An accumulation of dated custom rules that are never removed makes the WAF hard to explain during the next incident.
Conclusion
An emergency Azure WAF custom rule is acceptable if it remains bounded and provable. The change should start from a precise symptom, limit scope, document priority, monitor the effect in KQL and keep rollback explicit.
The question is not only “does the rule block?”. It is “do we know what it blocks, what it hides, when to remove it and how to prove that legitimate traffic remains protected?”. That discipline turns a fast reaction into controlled operations.