In simple terms, antifragile is the ability of your team to come together and thrive through radical changes and disruptions.  Nicholas Taleb first popularized Antifragility in his book: “Antifragile: Things That Gain from Disorder.”  When applied to Information Technology (IT) and specifically system architecture, it is the principle of building systems that grow stronger over time in the face of stress and failure.  Many large system developers follow this concept.  Systems are tested constantly by injecting faults and ensuring that systems become more resilient and respond better over time.  Some companies will deliberately and without warning take down servers, network segments, data centers, and regions without warning with the expectation that the overall system will reduce service gracefully or will continue functioning without interrupting the customer experience. Failure becomes an opportunity for improvement and to learn how to prevent future failures that may not have occurred previously.  Most system designers are used to building systems that can resist or respond to failure, but the concept of improvement is the core principle of antifragile practices. 

The antifragile concept can be applied to your business to identify risk areas and propose solutions to help IT services grow better over time. Antifragile is not just a single event like a typical assessment.  It is also not a single punch list such that once the tasks are complete, systems are resilient.  Antifragile is an ongoing feedback-driven process.  However, this means the participants must change as well by:

  • Understanding that IT is first and foremost a service industry intended to meet the needs of the business users
  • Replacing blame and fear of failure with understanding and learning
  • Taking personnel responsibility for professional growth so that no one lets their skill set go stale
  • Breaking down complacency, so problems are identified and fixed without implementing unsustainable and unacceptable workarounds
  • Requiring everyone to be responsible for improving processes, expanding automation, and reducing the likelihood of failures
  • Testing and verifying assumptions in the constant search for a “better way to do something.”
  • Diagnosing and fully understanding every service interruption and failure so future failures become less likely

Developing a prioritized risk matrix can provide a very rough summary of the impact of a particular failure and the likelihood of a failure.  This will produce the expected reduction in risk by addressing specific problems.  An issue that is very likely to occur with low impact is far less critical than a rare problem that can have catastrophic effects.  A catastrophic problem not yet identified is the worst possible risk since it cannot be quantified or directly planned.

In addition, a critical component of antifragile is making things better after failure and not simply fixing the point in time issue.  As stresses or failures are identified, systems are not just modified to address a single failure; they are made more robust to resist unknown and unpredictable sources of failure. 

Dewpoint can help your business, and your team become stronger during turbulent times.  To learn more about antifragile, contact us to start a discussion.

Recommended Posts