I cannot compute all the money that has been extorted from you—in hidden taxes, in regulations, in wasted time, in a lost effort, in energy spent to overcome artificial obstacles. I cannot compute the sum, but if you wish to see its magnitude —look around you. ([Ayn Rand,[0])
This extract was scribbled to my notepad more than 7 years ago and yet gained more and more sense while time passes by. In my humble opinion, both books have a strong potential to awaken up the curious reader from long-lasting slumber. The one, who was in IT for many years would start recognizing characters from "The Phoenix Project" as they were his colleagues. It would be improper to retell the story, but what is being discovered by the antagonist is the basis for this article. The selected accompanying quotes from ([Ayn Rand,[0]) are highlighted with italic font.
Decades of lessons learned from manufacturing, high-reliability organization, high-trust management models, and others have brought us to the DevOps practices we know today. The focus on the DevOps topic is strong in the book, but it should not eclipse the other useful practices and recommendations that are also present.
While more and more fancy processes, frameworks, and guidelines are created there is a scientific and fundamental way to do things, that holds through time and is easily verifiable. The whole point of the "Phoenix" is that only through experiments are you could prove that something is applicable and working in your team or organization. One could assert own gaps in steps 2 and 3 of Feynman Technique.
Don't let the bold-sounding names puzzle or force you to think that it declares the best-and-only way:
... observe with what passionate consistency the mystics of muscle are striving to make you forget that a concept such as 'mind' has ever existed. Observe the twists of undefined verbiage, the words with rubber meanings, the terms left floating in midstream, by means of which they try to get around the recognition of the concept of 'thinking.' ([Ayn Rand,[0])
The Chaos
Left unchecked disorder increases over time. Energy disperses, and systems dissolve into chaos. Something in your organization needs improvement (or you likely won't be reading this).
Downward spiral
Before starting the descent down the spiral, it is vital to recall four categories of work where each could be amplifying the chaos processes as in the Butterfly effect:
- Business projects
- Internal IT Operations projects
- Changes
- Firefighting aka "unplanned work" gets created when something is done wrong.
You are in debt
The first act - you are in debt already. There is technical debt and daily workarounds that are a constant part of our lives, always promising to ourselves that we'll fix the mess when there would be a little more time. But that time never comes. Just like financial debt, the compounding interest costs grow over time. If an organization doesn't pay down its technical debt, every calorie in the organization can be spent just paying interest, in the form of unplanned work.
Yet another urgent project
The next act begins when somebody has to compensate for the latest broken promise(no matter at which level it was given). As result, Development is tasked with another urgent project that inevitably requires solving new technical challenges and cutting corners to meet the promised release date, further adding to technical debt - and made with the same promise as is the first act.
Heavy burden of debt
The final act is where everything becomes just a little more difficult, bit by bit- everybody gets a little busier, work takes a little more time, communications become a little slower, and work queues get a little longer. Work requires more communication, coordination, and approvals.
Poor or absent DevOps as a marker of chronic conflict
DevOps smells:
- not able to deploy production changes in minutes or hours, instead of requiring weeks or months.
- nor being able to deploy hundreds or thousands of changes into production per day.
- nor are production deployments routine, instead involving outages and chronic firefighting and heroics.
In an age where competitive advantage requires fast time to market, high services levels, and relentless experimentation, these organizations are at a significant competitive disadvantage. This is in large part due to their inability to resolve a core, chronic conflict within their technology organization. In other words, when organizational measurements and incentives across different silos prevent the achievement of global, organizational goals.
The collateral damage of chronic conflict
Is Atlas Shrugged a mystery or fortune-teller's story? The plot follows Dagny Taggart, an executive at the fictional Taggart Transcontinental Railroad Company, as she witnesses a bureaucratic crackdown on the industry. Each next step she makes becomes more and more resistant.
The wheels begin grinding slower and slower and require more effort to keep turning.
The costs
When people are trapped in this downward spiral for years, they often feel stuck in a system that pre-ordains failure and leaves them powerless to change the outcomes. This powerlessness is often followed by burnout, with the associated feelings of fatigue cynicism, and even hopelessness and despair.
Many psychologists assert that creating systems that cause feelings of powerlessness is one of the most damaging things a man can do to fellow human beings - the deprivation of other people of their ability to control their own outcomes and even forcing a culture where people are afraid to do the right thing because of fear of punishment, failure, or jeopardizing their livelihood. This can create the conditions of learned helplessness, where people become unwilling or unable to act in a way that avoids the same problem in the future.
It means not only long hours, working on weekends, and decreased quality of life for an employee, but also for their family and friends. It is not surprising that when this occurs, the best people are leaving (besides those, who feel that they can't leave, due to the sense of duty or obligation).
Industrialists and inventors led by the powerful John Galt strike back on the government by disappearing and effectively draining society of its great thinkers. They proclaim this problem in other words:
"To interpose the threat of physical destruction between a man and his perception of reality, is to negate and paralyze his means of survival; to force him to act against his own judgment, is like forcing him to act against his own sight. Whoever, to whatever purpose or extent, initiates the use of force, is a killer acting on the premise of death in a manner wider than murder: the premise of destroying man's capacity to live." [Ayn Rand,[0])
Crossing the chasm
The Lean movement is based on a huge legacy
Techniques such as Value Stream Mapping, Kanban Boards, and Total Productive Maintenance were codified for the Toyota Production System in the 1980s.
The agile manifesto that was created in 2001 has one key principle: "deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale".
Back in 2009, the seminal 10+ Deploys Per Day: Dev and Ops Cooperation at Flickr had made a great historical impact on the industry.
Breaking the downward spiral
Generating organizational learning after fix or accident enables you to prevent the problem from recurring and allows you to detect and correct similar problems faster in the future.
Everyone is constantly learning, fostering a hypothesis-driven culture where the scientific method is used to ensure nothing is taken for granted - nothing is done without measuring and treating product development and process improvements as experiments.
Instead of a culture of fear, a high-trust, collaborative culture is built, where people are rewarded for taking risks. They can fearlessly talk about problems as opposed to hiding them or putting them on the backburner - after all, problems must be seen to become solvable.
Everyone has ownership in their work, regardless of their role in the technology organization. they have confidence that their work matters and is meaningfully contributing to organizational goals, proven by their low-stress work environment and their organization's success in the marketplace.
The three (subsequent) ways (steps) to dissolve the chaos
The first way
The first way enables fast left-to-right flow of work from Development to Operations to Customer.
When long deployment lead times are observed, heroics are required at almost every stage of the value stream.
The practices for fast left-to-right flow include:
- Making work visible and limited (ideally kanban board will span the entire work stream)
- Reducing batch sizes and intervals of work
- Building quality by preventing defects from being passed downstream
- Constantly optimizing for the global goals
The visibility is required to alleviate or eliminate the following burdens and hardships in order to achieve the goal of fast flow.
Limiting WIP
Limiting WIP makes it easier to see problems that prevent the completion of work. You may find that there is nothing to do because you are waiting on someone else. Although the temptation to start new work is strong, it's the better action to find out what is causing the delay and help fix that problem.
stop starting. start finishing. (David J. Andeson)
Reduce batch sizes
One of the key lessons in Lean is that in order to shrink lead times and increase quality, we must strive to continually shrink batch sizes. The theoretical lower limit for batch size is single-piece flow.
Reduce number of handoffs
Even under the best circumstances, some knowledge is inevitably lost with each handoff. With enough handoffs, the work can completely lose the context of the problem being solved or the organizational goal being supported. This could be achieved by reducing the number of handoffs, either by automating significant portions of the work or by reorganizing teams so they could deliver value to the customer themselves, instead of having to be constantly dependent on others.
Continually identify and elevate constraints
There are five focusing steps to deal with constraints:
- Identify the constraint
- Exploit the constraint - ensure that the constraint is not allowed to waste any time. Ever.
- Subordinate the constraint In the theory of constraints, this is typically implemented by something called Rum-Buffer-Rope. Which would allow setting the tempo to release work control and control WIP.
- Elevate the constraint
- If in the previous steps a constraint has been broken, go back to step one but do not allow inertia to cause a constraint.
Eliminate hardships and waste in the value stream
John Galt hated the outer world for its inefficiency and waste. And very next moment he presents his utopian world:
- ...but that was not the way it worked in the outer world. Down what drain were they poured out there, our days, our lives, and our energy? Into what bottomless, futureless sewer of the unpaid-for? *
Here, we trade achievements, not failures—values, not needs. ([Ayn Rand,[0])
Waste is considered one of the largest threats to business viability.
Lean definition of waste - "the use of any material or resource beyond what customer requires and willing to pay for."
The goal in Lean is to reduce hardship and drudgery in our daily work through continual learning to achieve organizational goals.
Technology value stream has the following categories of waste:
- Partially done work. I.e. any work in the value stream that is not completed. This work becomes obsolete and loses value over time.
- Extra processes. Any work that does not adds value to the customer, and yet increases the lead time. This could be a set of review approvals or comprehensive documentation that is not used downstream.
- Extra features. KISS principle applied, these features are needed neither by the customer nor by the organization.
- Task switching. it's the case when the one is assigned to multiple projects and value streams, requiring him to context switch and manage the dependencies between work. This yields added effort and time in each value stream.
- Waiting. For sure when the one who waits between work for some resource or other work to be done increases cycle time and prevents the value delivery to the customer.
- Motion. When the people from different work centers that are not colocated have to communicate frequently one to another (the moving of information or materials). The handoffs are yet another example of excessive communication required to resolve ambiguities.
- Defects. The additional effort to resolve incorrect, missing, or unclear information, materials, or products.
- Manual work. Ideally any dependencies on Operations should be automated, self-services, and available on demand. Relying on manual or non-standard work for, others is a waste.
- Heroics. Individuals and teams are put in the position where they must perform unreasonable acts, which may even become a part of their daily work.
The second way. May be the feedback with you
The second way enables a fast and constant flow of feedback from right to left at all stages of our value stream. It requires the amplification of the feedback to prevent problems from happening again, or enable faster detection and recovery. Meaning, that the quality is created at the source and generates or embeds knowledge where it's needed. The goal is to create a safer and more resilient system of work.
Interesting that in the Chinese language the word problem or threat and the word opportunity are one and the same word: wēijī, 危机. This means every problem can be an opportunity in disguise and every opportunity can be a problem in disguise. (Adizes,[9])
When failures and accidents occur, they could be treated as opportunities for learning, as opposed to causing punishment and blame.
The purpose of management, leadership, parenting, or governing is exactly that: to manage change. To solve today’s problems that were generated in the past and get ready to deal with future problems we create with our decisions today. No management is needed when there are no problems, and there are no problems only when we are… Dead. (Adizes,[9])
As per Dr. Steven Spear: Designing perfectly safe systems is likely beyond our abilities, but it is possible to make it safer to work in complex systems when the following conditions are met:
- Complex work is managed so that problems in design and operations are revealed
- Problems are swarmed and solved, resulting in the quick construction of new knowledge
- New local knowledge is exploited globally throughout the organization
- Leaders create other leaders who grow these types of capabilities.
See problems as they occur
As per [Senge, 10] feedback loops are a critical part of learning organizations and systems thinking. Feedback and feedforward loops cause components within a system to reinforce or counteract each other.
In the technology value stream, poor results are often caused by the absence of fast feedback.
Swarm and solve problems to build new knowledge
In a Toyota manufacturing plant, above every wor center is a cord that every worker and manager is trained to pull when something goes wrong; e.g. when a part is defective, when a required part is not available, or even when work takes longer than documented. When this happens team leader is alerted and immediately works to resolve the problem. If the problem cannot be resolved within a specified time, the production line is halted so that the entire organization can be mobilized to assist with problem resolution until a successful countermeasure has been developed.
The practice of swarming seems contrary to common management practice, as we're deliberately allowing a local problem to disrupt operations globally. What is more important is that swarming enables learning. it prevents the loss of critical information due to fading memories or changing circumstances, as with time it becomes impossible to reconstruct exactly what was going on when the problem occurred.
At the end of the day, this requires the presence of culture that makes it safe and even encouraged to pull the Andon cord.
Keep pushing quality closer to the source
When top-down, bureaucratic command and control systems become ineffective, it's usually because the variance between who should do something and who is actually doing something is too large, due to insufficient clarity and timeliness.
Everyone is needed in the value stream to find and fix problems in their area of control as part of our daily work. By doing this, quality and safety responsibilities and decision-making are pushed to where the work is performed, instead of relying on approvals from distant executives.
Enable optimizing for downstream work centers
According to Lean, our most important customer is our next step downstream. Optimizing our work for them requires having empathy for their problems to better identify the design problems that prevent fast and smooth flow.
In the technology value stream, work artifacts are optimized for downstream work centers by designing for operations, where operational NFRs (e.g. architecture, performance, stability, testability, configurability, and security) are prioritized.
The third way. Building the culture
The third way enables the creation of a generative, high-trust culture that supports dynamic, disciplined, and scientific approaches to experimentation and risk-taking, facilitating the creation of organizational learning, both from success and failures. New knowledge effects can be multiplied, via transforming local discoveries into global improvements. Regardless of where someone performs work, they do so with the cumulative and collective experience of everyone in the organization. Back in 2009, Mike Rother concluded that the Lean community missed the most important practice of all, which he called the improvement kata.
The third way focuses on creating a culture of continual learning and experimentation. These are the principles that enable the constant creation of individual knowledge, which is then turned into teams and organizational knowledge. The individuals are performing experiments in their daily work to generate new improvements, enabled by rigorous standardization of work procedures and documentation of results. By applying a scientific approach to both process improvement and product development, everyone in the organization could learn from our successes and failures. Time is reserved for the improvement of daily work to further accelerate and ensure learning. By creating this continual and dynamic system of learning, teams are enabled to rapidly and automatically adapt to an ever-changing environment.
As an antithesis, it is worth mentioning a culture of fear and low trust, where workers have little ability to integrate improvements and learnings into their daily work, with suggestions for improvement *apt to meet a brick wall of indifference.. It's obvious that leadership is actively suppressing, even punishing the "learning and improvement", "the perpetuating quality and safety problems".
Enabling organizational learning and safety culture
Dr. Westrum defined three types of culture
Pathological | Bureaucratic | Generative |
Information is hidden | Information may be ignored | Information is actively sought |
Messengers are "shot" | Messengers are tolerated | Messengers are trained |
Responsibilities are schrinked | Responsibilities are compartmented | Responsibilities are shared |
Bridging between teams is discouraged | Bridging between teams is allowed, but discouraged | Bridging between teams is rewarded |
Failure is covered up | Organisation is just and merciful | Failure causes inquiry |
New ideas are crushed | New ideas create problems | New ideas are welcomed |
In the technology value stream, the foundations of a generative culture are established by striving to create a safe system of work. When accidents and failures occur, instead of looking for human error, the focus is on how the system could be redesigned to prevent the accident from happening again.
Institutionalized the improvement of daily work
Teams are often not able or willing to improve the processes they operate within. The result is not only that they continue to suffer from their current problems, but their suffering also grows worse over time due to chaos and entropy as the processes degrade over time.
Lean IT observed: "Even more important than daily work is the improvement of daily work."
The daily work is improved by explicitly reserving time to pay down technical debt, fix defects, and refactor and improve problematic areas of our code and environments. This could be achieved by reserving cycles in each development interval or by scheduling kaizen blitzes, which are periods when engineers self-organize into teams to work on fixing any problem they want.
Transform local discoveries into global improvements
When new learnings are discovered locally, there must also be some mechanism to enable the rest of the organization to use and benefit from that knowledge.
Leaders reinforce a learning culture
Traditionally leaders lead by "making all the right decisions". However, there is significant evidence that shows the contrary, that the leader's role is to create the conditions so their team can discover greatness in their daily work. In other words, creating greatness requires both leaders and workers, each of whom is mutually dependent upon each other. A complimentary working relationship and mutual respect must occur between leaders and frontline workers. leaders must elevate the value of learning and disciplined problem-solving.
The value stream conditions per each work center level frame the scientific experiment:
- the problem to solve is explicitly stated,
- as well as the hypothesis of countermeasure that will solve it,
- the methods for testing that hypothesis,
- the interpretation of results,
- and the learnings to inform the next iteration.
The coaching questions to the person conducting the experiment asked by the leader:
- What was your last step and what happened?
- What did you learn?
- What is your condition now?
- What is your next target condition?
- What obstacle you are working on now?
- What is your next step?
- What is your expected outcome?
- When could be checked?
Conclusion
Although fostering a culture of continual learning and experimentation is the principle of a Third Way, it is also interwoven into the First and Second Ways. In other words, improving flow and feedback requires an iterative and scientific approach that includes framing of target condition, starting hypothesis is what will help us get there, designing and conducting experiments, and evaluating the results.
The benefits of using three ways are not only better performance, but also increased resilience, higher job satisfaction, and improved organization adaptability.
I started my life with a single absolute: that the world was mine to shape in the image of my highest values and never to be given up to a lesser standard, no matter how long or hard the struggle.([Ayn Rand,[0])
Further reading
- Ayn Rand. Atlas Shrugged
- Eliyahu M. Goldratt. The goal
- Patrick M. Lencioni. The Five Dysfunctions of a Team: A Leadership Fable
- Jez Humble. Dave Farley. Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation
- Eric Ries. The Lean startup
- Steven J. spear. The High-Velocity Edge: How Market Leaders Leverage Operational Excellence to Beat the Competition
- Mike Rother. Toyota Kata: Managing People for Improvement, Adaptiveness and Superior Results
- David J. Anderson. Kanban: Successful Evolutionary Change for Your Technology
- Mary Poppendieck. Implementing Lean Software Development: From Concept to Cash
- Ichak Kalderon Adizes, Ph.D. Mastering Change. Introduction to Organizational Therapy
- Dr. Peter Senge. The Fifth Discipline Fieldbook: Strategies and Tools for Building a Learning Organization
- Total Productive Maintenance: Strategies and Implementation Guide