Ahmet Alp Balkan 11/15/2024

Tale of a Kubernetes node-feature-discovery incident

Read Original

A detailed post-mortem of a Kubernetes incident where upgrading the node-feature-discovery (NFD) component caused major scale issues. The new version's architectural shift to using NodeFeature custom resources consumed excessive etcd storage (~140 KB per node) in large production clusters, breaking pod scheduling. The article covers the decision to roll back and provides lessons on evaluating off-the-shelf components for large-scale operations.

Tale of a Kubernetes node-feature-discovery incident

Comments

No comments yet

Be the first to share your thoughts!

Browser Extension

Get instant access to AllDevBlogs from your browser

Top of the Week

1
The Beautiful Web
Jens Oliver Meiert 2 votes
3
LLM Use in the Python Source Code
Miguel Grinberg 1 votes
4
Wagon’s algorithm in Python
John D. Cook 1 votes