Most of us would probably say that our preventive maintenance programs could use some work. But where to even start? And once you have started, how do you keep it going? In this article, I’ll share what I’ve learned over the course of 20+ years in the industry. I’ll explore what you need to know, the steps you need to take, and the things you need to watch out for.
Ever gone on the web with the idea that you wanted to find a solution to help you improve your preventive maintenance (PM) program?
Did you come away confused and frustrated by a never-ending stream of acronyms, conflicting claims from a multitude of expensive consultants selling programs, and expensive software?
But let me say this about improving your PM program:
It doesn’t have to be complicated.
It doesn’t have to be expensive.
It doesn’t need fancy software.
It doesn’t require expensive consultants.
It does, however, require disciplined thought, insight into modern maintenance principles, a good understanding of your equipment, a collaborative approach, the adoption of standards, and an ability to implement change in your organisation.
I didn’t say it was easy.
DISCLAIMER: Now before we go any further, let me be very clear—I am not a certified reliability-centered maintenance (RCM) facilitator, trainer, or consultant. I have had training that covered RCM, preventive maintenance optimisation (PMO), instrumented protective functions (IPF), risk-based inspection (RBI), etc., and I have over 20 years of experience. It’s that practical experience that I am using to write this. If you want an in-depth treatment of RCM, I suggest you read John Moubray’s book RCMII or Anthony ‘Mac’ Smith’s book RCM – Gateway to World Class Maintenance.
You could maybe even get yourself certified as an RCMII facilitator. Just remember that RCM alone does not get you to world-class maintenance nor lead directly to a reliable plant.
In fact, there are many reasons why you should not run out and start implementing RCM in your plant. But I’ll discuss that in another article.
Later in this article, I will outline a step-by-step process that you can use to tackle your PM program. But first, let’s explore the things you need to have in place before you start that process.
The first is what I call “disciplined thought” which means that you make decisions based on data, facts, and careful consideration. Too often, people jump to conclusions based on their own personal experience or views. Too often, these conclusions are not underpinned by fact or data.
When you start to optimise your PM program you really want to ensure that you use an evidence-based decision-making process.
It’s not hard. It just takes discipline to always consider if the decision you’re about to make is supported by facts, data and hard evidence. Not opinion.
In a future article, I’ll explore fact-based decision-making and how it supports defect elimination and root cause analysis.
A collaborative approach
Working in a diverse team with people from different backgrounds, technical trades and skills can usually help to enforce a fact-based decision-making process.
As long as you make sure that one single person is not allowed to ‘rule the room’.
You know what I mean: one strong-minded individual who, for whatever reason, is allowed to dictate his or her will to the group because of experience, charisma, or sheer force.
Don’t allow it to happen. Team-based decisions derived from data and facts are always going to be better decisions than the opinion of one person.
The other important aspect of taking a collaborative team approach is that you can bring other stakeholders into the analysis and decision-making process, which will reduce resistance when it comes to implementation.
So, think carefully about who you’ll need to convince when the time comes to implement your modified PMs, and bring them or their representatives into the process.
Typically, you want to have the operations or production team involved, plus engineering, and you must ensure you have representation from the frontline maintenance team.
Insight into modern maintenance principles
Before you start your review of your PM program, you need to make sure you and the people involved in the review process understand the basic principles of modern maintenance. I have outlined these in the post “9 Principles of Modern Maintenance”, which are summarised below:
Principle #1: Accept failures
Not all failures can be prevented by maintenance. And in some cases, the impact of the failure is so low that we simply accept it. Good maintenance programs don’t try to prevent every failure.
Principle #2: Most failures are not age related
Over 70% of failure modes are not age-related. In these cases, there is no point doing time-based life-renewal tasks like servicing or replacement. Condition monitoring would be more effective.
Principle #3: Some failures matter more than others
Good maintenance programs are risk-based in that they consider both the consequence and the likelihood of failures. They use this information to assess where to use our scarce resources to get the greatest benefit.
Principle #4: Parts might wear out, but your equipment breaks down
Parts are simple components that have their own failure modes and failure patterns. Most modern machinery consists of many parts and should be treated as complex items. Because complex items have so many different failure modes, they don’t typically exhibit wear out, which makes time-based overhauls ineffective and wasteful.
Principle #5: Hidden failures must be found
Hidden failures are failures that remain undetected during normal operation. They’re often associated with equipment with protective functions. These failures only become evident when you need an item to work and it doesn’t (failure on demand), or when you conduct a failure-finding task.
Principle #6: Identical equipment does not mean identical maintenance
It’s possible to have two identical pieces of equipment operating in very different contexts (e.g., the classic duty-versus-standby case). A difference in criticality can also lead to different maintenance needs.
Principle #7: “You can’t maintain your way to reliability”
No amount of maintenance can raise the inherent reliability of a design. When a poor design is the cause of poor reliability or performance, the only solution is to change the design.
Principle #8: Good maintenance programs don’t waste your resources
Completing tasks that don’t add value, maintaining an unnecessary level of performance, and assigning multiple tasks to a single failure mode are all examples of wasteful practices that should be avoided.
Principle #9: Good maintenance programs become better maintenance programs
The most effective maintenance programs are dynamic and continuously improving. These programs look for ways to eliminate unnecessary tasks, convert time-based tasks into condition-based tasks, and extend task intervals.
Don’t underestimate the importance of knowing, understanding, and applying these principles. So many reviews of PM programs around the world in different industries keep coming back with the same findings: most PM programs are wasteful and ineffective.
And that is because they were not built with these principles in mind.
Wasteful PM programs have too many unnecessary tasks. They might also perform a task too frequently. Both cause several problems.
To start, they use up valuable resources: money and time. Material costs are higher than they need to be, and your maintenance labour is focused on the wrong things. This leaves less time for your scarce resources to focus on the tasks that are actually necessary.
It’s important to remember that these unnecessary tasks affect more than just the frontline maintenance technicians. Significant effort is required to plan, schedule, manage, and report on all this work.
And let’s not forget that intrusive maintenance can cause more harm than good through the introduction of defects (either in workmanship or material quality).
So how is it that most PM programs end up in such a spot?
One reason is that people tend not to like equipment failures. Too often, one of the first recommendations made after a piece of equipment fails is to create a new PM to prevent the failure from reoccurring.
This line of thinking doesn’t usually stop to consider what failure pattern applies to the failure mode in question, and whether a time-based task will actually help.
OEMs also have a role to play here.
Most manufacturers will provide some recommended maintenance tasks as part of an owner’s manual or even as a warranty requirement. The problem is that they don’t know your operating context. Their recommendations are probably going to be conservative and may not even apply to your operation.
Good understanding of your equipment
The other thing you need to make sure of is that when you go into the review process and tackle your PM tasks, you and the team involved truly understand the equipment you’re analysing. And you truly understand how your plant works.
For each piece of equipment you’re reviewing, you must be able to find and understand the main failure modes that are likely to occur, and understand the impact these failure modes would have on the wider plant.
Remember, a failure mode is an event that causes a functional failure. It’s best described as a verb + noun statement that describes the physical state of the item (e.g., fractured axle).
It’s also important to remember that most failure modes are not age-related. The occurrence of most failure modes is random.
This is where the collaborative, team-based approach once again helps because rarely does one person know all this. Having engineering staff, frontline maintenance staff, and operations personnel all together in the room when you discuss the failure modes and their consequences will lead to much better results.
So, make sure you have the right people in the room for these discussions. And that may very well mean that when you analyse different equipment you may need to have some different or additional people in the room.
I believe that an important aspect of improving your PM program is to adopt standards.
Adopt a standard for how you define and name failure modes, failure causes, and failure consequences.
ISO 14224-2016 can help you with this.
It covers the collection and exchange of reliability and maintenance data for equipment in the petroleum, petrochemical, and natural gas industries. It describes three main categories of information: equipment data, failure data, and maintenance data.
Adopting this standard will allow you to set up and collect reliability and maintenance data in a consistent way. This will help you to better understand your reliability issues and support you in your maintenance strategy optimisation efforts.
You should also adopt a standard for how you word maintenance tasks, and even more importantly, create a standard for the writing of your maintenance work instructions.
ASD-STE100 Simplified Technical English is a valuable resource that can help you write clear and consistent maintenance documentation. It gives you a set of rules and a dictionary for technical writing.
The rules and dictionary help to simplify complex technical language so that it’s easily understood by all readers. This reduces the likelihood of a task being misunderstood or performed incorrectly.
A good standard for creating maintenance task instructions will help to reduce variability, and that is key for the success of your PM program.
If your tasks are done by different people and every person does the task in a slightly different way, you will eventually struggle to determine if your task is ineffective or simply not being done the right way.
So be very clear when you write maintenance tasks: use a set of standardised verbs and nouns, always structure your tasks the same way and make sure you provide all information needed. Do not expect your technicians to go away and find more details from equipment manuals that could be anywhere. If they need to know it to do the job, include it.
You should also adopt a standard for how you set up your PM program in your computerised maintenance management system (CMMS) so you have clear failure modes and failure codes.
Link the failure codes your technicians enter when a failure has occurred to your PM tasks. This makes it much easier for your reliability engineer to analyse if a PM task is effective or not.
And, make sure you develop and adopt a lubrication standard.
So many failures occur because of poor lubrication management and practices. I think that almost every PM review program needs to include a review of your lubrication practices. Make sure your lubrication standard covers the storage, handling, filtering, dispensing, disposing, and cleanliness of lubricants.
Ability to implement change in your organisation
Another important aspect you need to cover before starting your PM review is how you will ensure the required changes are implemented by the organisation.
Not just the physical changes in the PMs, but how you are going to get the new or revised standards rolled out and adopted. Emailing them to everyone in your plant won’t do much good.
You need to start by clearly identifying what good looks like, whether this is in terms of cost reduction or improved availability. Next, you need to develop a clear plan for how to get there.
After you’ve developed your vision and plan, you need to get senior leadership on board. Once they understand the value these changes will bring, it’ll be easier for others in the organisation to buy into the changes.
But the work doesn’t stop here.
An effective rollout requires an engaged front line, whether these are your maintainers, operators, or engineers.
You need to make both a rational and an emotional case for change. Show your people there’s a better way of doing things. Explain what’s in it for them.
Only then will you be able to effectively roll out your changes and improve your PM program.
Measure your success
How will you know if your changes were successful? Before you implement your changes, make sure you have some (simple) metrics documented that summarise the performance of your current PMs. That could be uptime, MTBF, costs associated with failures, etc. It doesn’t really matter what you choose, as long as it is a measure that is meaningful and important to the plant as a whole.
And ideally, you express these measures as a total cost metric that combines both costs incurred and lost revenue due to downtime. This is powerful, and it’s a language that your plant manager will understand. And appreciate.
Once you have implemented the changes to the PM program, monitor the impacts for a sustained period. At least 12 months, but maybe even 24 or 36 months.
Trend the improvements in the metrics you have selected, and convert those improvements into costs savings and additional revenue due to increased uptime.
Don’t start with a major program
Big programs are costly. They often require a significant upfront investment, and they tend to show a slow return on investment. This makes them hard to get approved. They only work in organisations whose leadership believes in the value of both a maintenance department and an effective PM program.
For most organisations, you first need to demonstrate the return on investment (ROI), and you must do that quickly with limited upfront investment.
Start by looking for common mistakes and quick wins: high-frequency PMs, repeat failures, PMs with the most annualised workload, and high-cost PMs.
Another reason why you don’t want to start with a major program is that you likely won’t know what the best approach is.
You could adopt an RCM process or a PMO process, but if you are starting out, I wouldn’t recommend either until you actually do an initial in-depth review of your PM program yourself.
And here’s why: If you don’t know what’s wrong with your PM program, how do you know what to ask for? Running an RCM or PMO process will take a lot of resources, and it’s something that needs to be set up for success. Chances are you will only get one shot to prove its value to the organisation.
Fail and RCM or PMO is done for in your organisation.
So, what should you do?
Step 1 – In-depth analysis of your current PMs
Conduct an in-house PM review.
Gather failure data from your CMMS and identify the bad actors in your PM program by developing the following charts:
- Ranked listing of the equipment and components that have failed most frequently in the last 1, 3, 5 or 10 years. Break out the most common failure modes. Don’t fall in the trap of thinking that every failure mode can be or is best resolved with a new or better PM. Sometimes you need to eliminate the defect that is causing the failures, whether by improving the equipment design, the way the equipment is being operated or the standards to which you are maintaining it. This is why I talked so much about evidence-based decision making. Get to the bottom of those dominant failure modes.
- Ranked listing of the equipment and components that have cost you the most money in terms of preventive maintenance and corrective maintenance (CM) in the last 1, 3, 5 or 10 years. Spend some time scrutinising those charts. Does the equipment with high PM costs incur CM costs? A lot or almost none? Could you be over-maintaining? Do you have PMs in place for the equipment with high CM costs? No? We’ll analyse the failure modes and determine whether they can be effectively managed through maintenance, otherwise you better go down the route of defect elimination if you have high CM and high PM costs. If that’s the case, then clearly your PMs are not effective. Maybe the task is inappropriate, at the incorrect frequency or not properly executed. Or maybe this is a failure mode that you can’t mitigate through PM.
- Ranked listing of your most common PM tasks in terms of annualised hours by equipment or equipment type. Does the chart make sense? Are you spending most of your PM effort on your critical equipment? Or are you spending a lot of time and money on equipment that you are actually not too concerned about if it were to fail?
- Ranked listing of your most common CM tasks. What are the repairs you keep repeating? Is there an opportunity to resolve them with PM or DE?
- Ranked listing of your most common spare parts. Which components are you replacing on a regular basis? Are they exhibiting wear out or random failure patterns? Are there any tasks currently in place to monitor or mitigate these failure modes? If so, should the frequency be adjusted? If not, what tasks should be in place, if any?
- Ranked listing of your most common emergency maintenance work. What repairs are regularly breaking into your schedule? Have you found the true root cause of these failures? Are your PM tasks at the right frequency? Is a design change needed?
Test your results with your frontline personnel and technical support functions. Does it match their knowledge and understanding of where the issues are? If your CMMS data is patchy, tapping into the knowledge of your frontline teams is key.
Step 2 – Prioritise focus areas
Use the results of your in-depth analysis to prioritise your areas of focus. Be sure to set realistic goals and timelines.
Taking on too much too quickly will only result in frustration and poor adoption.
I often recommend starting with some quick wins. These are low-effort, high-return activities that help build momentum and support for the improvement effort.
As you gain traction and success, you can move on to more complex issues.
Step 3 – Decide on methodology
Once you have improvement opportunities clearly defined and prioritised, you need to decide which process or methodology to use.
Be careful not to fall into the great debate around RCM and PMO when you are in the early stages of optimising your PM program.
Early on you can get away with a quick and dirty approach, and this is what I would recommend. It creates buy-in for optimisation, enables quick wins, and allows you to set up a more robust improvement program.
But you must understand and properly apply the core principles of maintenance engineering (how to treat hidden functions, non-age-related failures, etc.).
In a hazardous industry, you really do need a formalised program. You can use RCM, or a combination of other processes (PMO, RBI, IPF), but it must be a robust and auditable living program that undergoes continuous improvement.
Step 4 – Conduct a pilot
Whatever methodology you choose, you’ll want to start with a pilot.
Pilot on a small system or part of a complex system. Pick something that will be relatively easy to deliver with a high ROI.
Evaluate the PMs or repeat failures. Ask the following:
- Do you need to replace fixed-time overhaul tasks with more (cost-) effective condition-based tasks?
- Do you have dominant age-related failure modes? Really? Are you sure?
- What are your dominant random failure modes?
- Do you have hidden failure modes?
- Do you have protective functions? Do they have failure-finding tasks assigned?
Once you’ve answered those questions, you can complete your analysis, summarise the benefits and present everything to management.
Be sure to extrapolate your findings/results to the remaining opportunities you identified in Step 1, then build a business case (i.e., a case for change) to do a larger PM improvement program.
Step 5 – Initiate and deliver PM improvement project
Your case for change should have been done before you started, but if you didn’t do one, create one now and make sure you have clear targets.
The next step is to get people on board: engineers, frontline technicians, operators, and a good facilitator who really understands the principles of building an effective and efficient PM program.
Once you’ve got the team set up, your next job is to educate the team. Teach them all about the benefits this project will bring. Put it in terms they can relate to. Clearly explain how these improvements will benefit them.
After that, you need to build a realistic project plan.
Set the project up as a series of small projects where you complete the analysis and implement your findings on one system, document your results, benefits, and lessons learned before you go to the next system. This way the business will get benefits sooner. Don’t analyse all the systems and then implement all the changes. This takes too long and delays the bottom-line impact, which puts the project at risk.
Of course, there are a whole series of steps involved in setting up and delivering a project like this, but I won’t go into those here.
- Big PMO/RCM programs can be very hard to deliver successfully, whether because you don’t get the resources you really need, you don’t pay enough attention to the change management required, you fail to properly implement it in your CMMS, or your RCM analysis does not result in well-written PM tasks. There are many possible problems.
- If you opt for an externally facilitated RCM or PMO program, you need to ensure you have high-level support and commitment, not just for the initial pilot but also for the ongoing program. You must get the best possible facilitator you can afford. You must make sure you really train and coach your own organisation. You need to create and implement a business process so that what you develop is actually sustained once it’s implemented.
- You do not need special software though it can make everything a lot more efficient.
- Tapping into the knowledge of your maintenance organisation (craftsmen, engineers, operators, etc.) is key.
- The use of task libraries can be efficient, and I have used them very successfully, but be careful. Very careful. Libraries do not account for different operating conditions (e.g., duty service versus standby service), nor do they account for different consequences of failure.
Don’t forget the actual maintenance work instructions/job plans! It is essential that you get repeatable maintenance tasks, and that requires good work instructions. Without repeatable work, you’ll have too much variation and you won’t be able to judge if your PM program is effective or not. Plus, poor work instructions undermine productivity.
Step 6 – Build continuous improvement loops
The key to having a sustainable program is to document an ongoing PM review process and build it out as a living program. It can’t just be a one-and-done deal. You need to evaluate your PMs, but you also need to evaluate your PM review practices for opportunities to be more efficient and effective.
Another important loop is found in the feedback from maintenance crews to planners on job plans. Your frontline workers are the best ones to tell you whether the tasks are correct, the materials are right, and the hours are appropriate.
Your program should also account for some data analysis by your reliability engineers. They should be monitoring equipment reliability and availability, and gauging the overall success of the program changes.
Part of this analysis will include bad actor trending and performance. What are the few (20%) assets causing most (80%) of the problems? Implementing a bad actor program is a great way to not only ensure you’re dealing with high-cost, repeat failures but also generating visible wins within the organisation.
Improving a PM program is no small feat. But it’s achievable with discipline, collaboration, sound technical knowledge, and a strong understanding of the value an optimised program can bring.
You don’t need fancy software or expensive consultants. You do need a good understanding of where your program is at today, what good looks like a consistent and methodical approach, and the tools to educate and motivate your workforce.
Start small, stay focused, and celebrate the quick wins. These are the keys to sustainably improving your PM program.