pcep.lol

It starts as a feature request. “We just need better traffic engineering.” “We just need centralized visibility.” You spin up a couple of compute nodes. Virtual machines. Maybe a Kubernetes cluster. They sit quietly in a rack somewhere — or worse, in a shared IT virtualization farm. They look harmless.

They are not harmless.

The moment you enable PCEP, you are no longer trusting your network to the distributed control plane you built and hardened over years. You are trusting it to a handful of compute nodes. Servers. Hypervisors. Storage arrays. Change windows owned by IT. Patch cycles. Firmware updates. Shared infrastructure.

Your network’s reliability now depends on systems that were never designed to carry that burden.

Routers are purpose-built. Deterministic. Hardened. They run minimal services. They do one thing: forward traffic and converge routing. A PCE runs on general-purpose compute. It shares fate with vMotion events. Storage latency. Kernel updates. Security scans. Capacity contention. Someone else’s reboot.

When those compute nodes blink — even briefly — your network blinks with them.

A failed PCE is not just “an application outage.” It is a control-plane authority disappearing. Stateful PCEP means the controller owns LSP state. It can instantiate, modify, and tear down tunnels. When it is unreachable, behavior changes. Timers expire. Sessions reset. Policies stall. Convergence becomes conditional.

And if that controller was mid-change?

You’ve centralized your blast radius.

Now imagine the failure domains.
Power maintenance in the data center where your PCE cluster runs.
A hypervisor patch that reboots all hosts.
A storage array hiccup that freezes I/O.
A firewall policy update that blocks TCP/4189 for just long enough.

Your routers will keep forwarding packets. But your engineered paths? Your SR-TE policies? Your fast reroute optimizations? They now depend on infrastructure managed by teams who do not carry the pager for MPLS core outages.

You’ve handed your network’s determinism to “the server team.”

And there’s more.

Security posture changes. That PCE cluster becomes a strategic target. Compromise it, and an attacker doesn’t need to log into routers. They don’t need CLI access. They can redirect traffic centrally. Intercept flows. Blackhole segments. Push malicious paths at scale. The PCE is a command-and-control plane for your backbone.

It’s efficient. That’s the problem.

Then comes the operational addiction.

At first, you keep distributed TE as a fallback. Over time, you lean into centralized policy. You depend on it for scale. For elasticity. For rapid provisioning. Soon, turning it off isn’t a rollback — it’s an architectural crisis. Your topology design assumes it. Your runbooks assume it. Your engineers assume it.

You won’t even remember what the network looked like before.

And when something goes wrong, troubleshooting shifts from deterministic routing math to opaque controller logic. “Why did the PCE choose that path?” becomes a question that spans telemetry pipelines, policy engines, database state, and cluster health.

You didn’t just add a protocol.
You added a distributed system.

Every distributed system fails. Not if. When.

Before deploying PCEP, ask yourself:

Are you prepared for your network’s stability to depend on compute nodes that can be rebooted by IT during a maintenance window?

Are you prepared for your backbone to hinge on hypervisors, storage latency, and patch compliance?

Are you prepared for a controller outage to become a routing event?

Because once you centralize path authority, you have concentrated risk.

PCEP.

Not even once.