Tale of a Kubernetes node-feature-discovery incident
Read OriginalA detailed post-mortem of a Kubernetes incident where upgrading the node-feature-discovery (NFD) component caused major scale issues. The new version's architectural shift to using NodeFeature custom resources consumed excessive etcd storage (~140 KB per node) in large production clusters, breaking pod scheduling. The article covers the decision to roll back and provides lessons on evaluating off-the-shelf components for large-scale operations.
Comments
No comments yet
Be the first to share your thoughts!
Browser Extension
Get instant access to AllDevBlogs from your browser