In this article, I provide a brief history of the development of Reliability Centered Maintenance (RCM). And from there we explore 9 principles derived from RCM that will help you build an effective Preventive Maintenance Program. As a maintenance & reliability practitioner, you should know these RCM principles and live by them.
Fix it when it breaks
For most of human history, we’ve had a very simple approach to maintenance: we fixed things as they broke. This served us well from our early days huddled around campfires until about World War II.
In those days industry was not very complex or highly mechanized. The downtime was not a major issue and preventing failures wasn’t a concern.
Figure 1: 1st Generation Maintenance was a long way from modern-day RCM
At the same time, most equipment in use was simple and more importantly, it was over-designed. This made equipment reliable and easy to repair. And most plants operated without any preventive maintenance in place. Maybe some cleaning, minor servicing, and lubrication, but that was about it.
This simple ‘fix it when it breaks’ approach to maintenance is often referred to as First Generation Maintenance. 1
Things changed during World War II
Wartime increased the demand for many, diverse products. Yet at the same time, the supply of industrial labour dropped. Productivity became a focus. And mechanization increased. By the 1950’s more and more complex machines were in use across almost all industries. Industry as a whole had come to depend on machines.
And as this dependence grew, it became more important to reduce equipment downtime. ‘Fix it when it’s broken’ no longer suited industry.
Figure 2: 2nd Generation Maintenance was dominated by time-based overhauls
A focus on preventing equipment failures emerged. And the idea took hold that failures could be prevented with the right maintenance at the right time. In other words, the industry moved from breakdown maintenance to time-based preventive maintenance. Fixed interval overhauls or replacements to prevent failures became the norm.
This approach to preventive maintenance is known as Second Generation Maintenance. 2
More maintenance, more failures
Between the 1950s and 1970s, the third generation of maintenance was born in the aviation industry.
After World War II air travel became widely accessible. And passenger numbers grew fast. By 1958 the Federal Aviation Administration (FAA) had become concerned about reliability. And passenger safety.
At the time the dominant thinking was that components had a specific life. That components would fail after reaching a certain “age”. Replacing components before they reached that age would thus prevent failure. And that was how you ensured reliability and passenger safety.
In the 1950’s and 1960’s the typical aircraft engine overhaul was every 8,000 hours. So when the industry was faced with an increasing number of failures, the conclusion was easy. Obviously component age must be less than the 8,000 hours that was being assumed. So, maintenance was done sooner. The time between overhauls reduced.
But, increasing the amount of preventive maintenance had three very unexpected outcomes. Outcomes that eventually turned the maintenance world upside down.
First of all, the occurrence of some failures decreased. That was exactly what everybody expected to happen. All good.
The second outcome was that a larger number of failures occurred just as often as before. That was not expected and slightly confusing.
The third outcome was that most failures occurred more frequently. In other words, more maintenance leads to more failures. That was counter-intuitive. And a shock to the system.
Figure 3: the findings that led to RCM were a shock to the system
The birth of Reliability Centered Maintenance
To say that the results frustrated both the FAA and the airlines would be an understatement. The FAA worried that reliability had not improved. And the airlines worried about the ever-increasing maintenance burden.
So during the 1960’s the airlines and the FAA established a joint task force to find out what was going on. After analyzing 12 years of data the task force concluded that overhauls had little or no effect on overall reliability or safety.
For many years engineers had thought that all equipment had some form of wear out pattern. In other words, that as equipment aged the likelihood of failure increased. But the study found this universally accepted concept did not hold true.
Instead, the task force found six patterns describing the relationship between age and failure. And that the majority of failures occur randomly rather than based on age.
The task force findings were used to develop a series of guidelines for airlines and airplane manufacturers on the development of reliable maintenance schedules for airplanes.
The first guideline titled “Maintenance Evaluation and Program Development” came out in 1968. The guide is often referred to MSG-1 and was specifically written for Boeing 747-100.
The maintenance schedule for the 747-100 was the first to apply Reliability Centered Maintenance concepts using MSG-1. And it achieved a 25% to 35% reduction in maintenance costs compared to prior practices.
As a result, the airlines lobbied to remove all the 747-100 terminology from MSG-1. They wanted the maintenance schedules for all new commercial planes designed using the same process.
The result was MSG-2, released in 1970 titled “Airline/Manufacturer Maintenance Program Planning”.
Figure 4: RCM led to the birth of 3rd generation maintenance thinking
Amazing results from the first applications of Reliability-Centered Maintenance (RCM)
The move to 3rd Generation or Reliability-Centered Maintenance as outlined MSG-1 and MSG-2 was dramatic.
The DC-8’s maintenance schedule used traditional, 2nd Generation Maintenance concepts. It required the overhaul of 339 components and called for more than 4,000,000 labour hours before reaching 20,000 operating hours.
Compare that to the maintenance schedule for the Boeing 747-100, developed using MSG-1. It required just 66,000 labour hours before reaching the same 20,000 operating hours! 3
Another interesting comparison is to compare the number of items requiring fixed-time overhauls. The maintenance for the DC-10 was developed using MSG-2 and required the overhaul of just 7 items versus the 339 on the DC-8.
And both the DC-10 and Boeing 747-100 were larger and more complex than the DC-8.
Impressive results. And the US Department of Defense (DoD) thought so too.
Figure 5: the benefits of reliability centered maintenance for the airline industry was massive
The US Department Of Defense gets involved in RCM
So in 1974, the DoD asked United Airlines to write a report on the processes used to write reliable maintenance programs for civilian aircraft. And in 1978 Stan Nowlan and Howard Heap published their report. It was titled “Reliability Centered Maintenance”.
Since then a lot more work was done to progress the cause of Reliability-Centered Maintenance. The airline industry has moved to MSG-3. John Moubray published his book RCM2 in the 1990’s introducing Reliability Centered Maintenance concepts to the industry at large.
Nowadays, RCM maintenance is defined through international standards. But it’s the work done in the 60’s and 70’s that culminated in the Knowlan & Heap report in 1978 that all modern-day RCM maintenance approaches can be traced back to.
That’s now more than 40 years ago. So any Maintenance & Reliability professional should be familiar with it by now. It’s been around long enough. It’s well documented. And widely available.
Unfortunately, we find that’s not the case. The principles of modern maintenance as developed in the journey to Reliability-Centered Maintenance are not always known or understood. Let alone applied.
The rest of this article will outline those principles. They should underpin any sound maintenance program.
One of the best summaries of these principles can be found in the NAVSEA RCM Handbook. 4 I would highly recommend reading it. It is well written and easy to understand. And the following Principles of Modern Maintenance are very much built on the ‘Fundamentals of Maintenance Engineering’ as described in the NAVSEA manual.
9 Principles from RCM to create an effective PM program
Whether you are developing a new maintenance program. Or improving the maintenance program for an existing plant. All reliable maintenance programs should be based on the following Principles of Modern Maintenance:
Principle #2: Most Failures Are Not Age-Related
Principle #3: Some Failure Consequences Matter More Than Others
Principle #4: Parts Might Wear Out, But Your Equipment Breaks Down
Principle #5: Hidden Failures Must Be Found
Principle #6: Identical Equipment Does Not Mean Identical Maintenance Strategy
Principle #7: “You Can’t Maintain Your Way To Reliability” 5
Principle #8: Good Maintenance Programs Don’t Waste Your Resources
Principle #9: Good Maintenance Programs Become Better Maintenance Programs
As a Maintenance & Reliability professional, you must understand these principles.
You must practice them.
You must live by them.
Principle #1: Accept failures
Not all failures can be prevented by maintenance. Some failures are the result of events outside our control. Think lightning strikes or flooding. For events like these, more or better maintenance makes no difference. Instead, the consequences of events like these should be mitigated through design.
And maintenance can do little about failures that are the result of poor design, lousy construction or bad procurement decisions.
In other cases the impact of the failure is low so you simply accept a failure (think general area lighting).
So, good maintenance programs do not try to prevent all failures. Good maintenance plans and programs accept some level of failures and are prepared to deal with the failures they accept (and deem credible).
Principle #2: Most failure modes are not age-related
As explained above the RCM research by the airline industry has shown that 70% – 90% of failure modes are not age-related. Instead, for most failure modes the likelihood of occurrence is random. Later research by the United States Navy and others found very similar results.
This research is summarised in the six different failure patterns shown below: 6 7
Figure 6: the failure patterns from various reliability centered maintenance studies
Apart from showing that most failure modes occur randomly. These failure patterns also highlight that infant mortality is common. And that it typically persists. That means that the probability of failure only becomes constant after a significant amount of time in service.
Don’t interpret Curves D, E, and F to mean that (some) items never degrade or wear out. Everything degrades with time, that’s life. But many items degrade so slowly that wear out is not a practical concern. These items do not reach wear out zone in normal operating life.
So what do these patterns tell us about our reliable maintenance programs?
Historically maintenance was done in the belief that the likelihood of failure increased over time (first generation maintenance thinking). It was thought that well-timed maintenance could reduce the likelihood of failure. RCM has taught us that for at least 70% of equipment this simply is not the case.
For the 70% of failure modes which has a constant probability of failure, there is no point in doing time-based life-renewal tasks like servicing or replacement.
It makes no sense to spend maintenance resources to service or replace an item whose reliability has not degraded. Or whose reliability cannot be improved by that maintenance task.
In practice, this means that 70% – 90% of equipment failure modes would benefit from some form of condition monitoring. And only 10% – 30% can be effectively managed by time-based replacement or overhaul.
Yet most of our PM programs are full of time-based replacements and overhauls.
Strictly speaking, the studies that documented the fact that 70% to 90% of failure modes were random only found this in specific industries and applications. And they were not conducted in major industries like oil & gas, mining, chemical manufacturing, food processing, power generation etc. So you could therefore dismiss these results with a simple “well, we’re in a different industry, so obviously this doesn’t apply to us”. I strongly believe this principle applies to all heavy industry (more on that in a future article) and would strongly encourage you to approach this with the question “Why wouldn’t it apply to us?” and then carefully examine what you could gain from applying this principle.
Principle #3: Some failure consequences matter more than others
When deciding on whether to do a maintenance task consider the consequence of not doing it. What would be the consequence of letting that specific failure mode occur?
Avoiding that consequence is the benefit of your maintenance.
The return on your investment.
And that is exactly how maintenance should be seen: as an investment. You incur a maintenance cost in return for a benefit in sustained safety and reliability. And as with all good investments, the benefit should outweigh the original investment.
So, understanding failure consequences is key to developing a good maintenance program. One with a good return on investment.
Just as not all failures have the same probability, not all failures have the same consequence.
Even if it relates to the same type of equipment.
Consider a leaking tank. The consequence of a leaking tank is severe if the tank contains a highly flammable liquid. But if the tank is full of potable water the consequence might not be of great concern.
But what if the water is required for fire fighting?
The same tank, the same failure but now we might be more concerned. We would not want to end up in a scenario of not being able to fight a fire because we had an empty tank due to a leak.
Apart from the consequence of a failure you also need to think about the likelihood of the failure actually occurring.
Maintenance tasks should be developed for dominant failure modes only. Those failures that occur frequently and those that have serious consequences but are less frequent to rare. Avoid assigning maintenance to non-credible failure modes. And avoid analyzing non-credible failure modes. It eats up your scarce resources for no return.
A maintenance program should consider both the consequence and the likelihood of failures. And since Risk = Likelihood x Consequence we can conclude that good maintenance programs are risk-based.
Good maintenance programs use the concept of risk to assess where to use our scarce resources to get the greatest benefit. The biggest return on our investment.
Principle #4: Parts might wear out, but your equipment breaks down
A ‘part’ is usually a simple component, something that has relatively few failure modes. Some examples are the timing belt in a car, the roller bearing on a drive shaft, the cable on a crane.
Simple items often provide early signals of potential failure, if you know where to look. And so we can often design a task to detect potential failure early on and take action prior to failure.
For those simple items which do “wear out” there will be a strong increase in the probability of failure past a certain age. If we know the typical wear outage for a component, we can schedule a time-based task to replace it before failure.
When it comes to complex items made of many “simple” components, things are different.
All those simple components have their own failure modes with its own failure pattern. Because complex items have so many, varied failure modes, they typically do not exhibit wear outage. Their failures do not tend to be a function of age but occur randomly. Their probability of failure is generally constant as represented by curves E and F.
Most modern machinery consists of many components and should be treated as complex items. That means no clear wear outage. And without clear wear out age performing time-based overhauls is ineffective. And wasteful of our scarce resources.
Only where we can prove that an item has wear outage does performing time-based overhaul or component replacement make sense.
Principle #5: Hidden failures must be found
Hidden failures are failures that remain undetected during normal operation. They only become evident when you need the item to work (failure on demand). Or when you conduct a test to reveal the failure – a failure finding task.
Hidden failures are often associated with equipment with protective functions. Something like a high-high pressure trip. Protective functions like these are not normally active. They are only required to function by exception to protect your people from injury or death. To protect the environment from a major impact or protect our assets from major damage. This means we pretty much always conduct failure finding maintenance tasks on equipment with protective functions.
To be clear, a failure finding task does not prevent failure. Instead, a failure finding task does exactly what its name implies. It seeks to find a failure. A failure that has already happened, but has not been revealed to us. It has remained hidden.
We must find hidden failures and fix them before the equipment is required to operate.
Principle #6: Identical equipment does not mean identical maintenance strategy
Just because two pieces of equipment are the same doesn’t mean they need the same maintenance. In fact, they may need completely different maintenance tasks.
The classic example is two exactly the same pumps in a duty – standby setup.8 Same manufacturer, same model. Both pumps process the exact same fluid under the same operating conditions. But Pump A is the duty pump, and Pump B is the standby. Pump A normally runs and Pump B is only used when Pump A fails.
When it comes to failure modes Pump B has an important hidden failure mode: it might not start on demand. In other words, when Pump A fails or under maintenance, you suddenly find that Pump B won’t start. Oops.
Pump B doesn’t normally run so you wouldn’t know it couldn’t start until you came to start it. That’s the classic definition of a hidden failure mode. And hidden failure modes like this require a failure finding task i.e. you go and test to see if Pump B will start. But you don’t need to do this for Pump A because it’s always running (unless when it’s off or failed).
So when building a maintenance program you must consider the operating context (RCM is very clear on this, but other approaches sometimes neglect operating content).
A difference in criticality can also lead to different maintenance needs. Safety or production critical equipment will need more monitoring and testing than the same equipment in low criticality service.
It’s important to reinforce that identical equipment may need different maintenance requirements. This is far too often forgotten or simply ignored for convenience. But you could find yourself facing critical failures by ignoring this basic concept. Especially if you use a library of preventive maintenance tasks.
Principle #7: “You can’t maintain your way to reliability”
I love this quote from Terrence O’Hanlon and it’s so very true. Maintenance can only preserve your equipment’s inherent design reliability and performance.
If the equipment’s inherent reliability or performance is poor, doing more maintenance will not help.
No amount of maintenance can raise the inherent reliability of a design.
To improve poor reliability or performance that’s due to poor design, you need to change the design. Simple.
When you encounter failures – defects – that relate to design issues you need to eliminate them.
Sure, the more proactive and more efficient approach is to ensure that the design is right, to begin with. But all plants startup with design defects. Even proactive plants. And that’s why the most reliable plants in the world have an effective defect elimination program in place.
Principle #8: Good maintenance programs don’t waste your resources
This seems obvious, right? But when we review PM programs we often find maintenance tasks that add no value. Tasks that waste resources and actually reduce reliability and availability.
It’s so common for people to say “whilst we do this, let’s also check this. It only takes 5 minutes.”
But 5 minutes here and there, every week or every month and we’ve suddenly wasted a lot of time. And potentially introduced a lot of defects that can impact equipment reliability down the line.
Another source of waste in our PM programs is trying to maintain a level of performance and functionality that we don’t actually need.
Equipment is often designed to do more that what it is required to do in its actual operating conditions. As maintainers, we should be very careful about maintaining to design capabilities. Instead, in most cases, we should maintain our equipment to deliver to operating requirements. Maintenance done to ensure equipment capacity greater than actually needed is a waste of resources.
Similarly, avoid assigning multiple tasks to a single failure mode. It’s wasteful and it makes it hard to determine which task is actually effective. Stick to the rule of a single, effective task per failure mode as much as you can. Only for very high consequence failure modes should you consider having multiple, diverse tasks to a single failure mode.
Most organisations have more maintenance to do than resources to do it with. Use resources on unnecessary maintenance, and you risk not completing necessary maintenance. And not completing necessary maintenance, or completing it late, increases the risk of failures.
And when that unnecessary maintenance is intrusive it gets worse. Experience shows that intrusive maintenance leads to increased failures because of human error. This could be simple mistakes. Or because of defective materials or parts, or errors in technical documentation.
A lot of maintenance is done with the equipment off-line. So doing unnecessary maintenance can also increase production losses.
So make sure you remove unnecessary maintenance from your system. Make sure you have a clear and legitimate reason for every task in your maintenance program. Make sure you link all tasks to a dominant failure mode. And have clear priorities for all maintenance tasks. That allows you to prioritise tasks. In the real world, we are all resource-constrained.
Principle #9: Good maintenance programs become better maintenance programs
The most effective maintenance programs are dynamic. They are changing and improving continuously. Always making better use of our scarce resources. Always becoming more effective at preventing those failures that matter to our business.
When improving your maintenance program you need to understand that not all improvements have the same leverage:
First, focus on eliminating unnecessary maintenance tasks. This eliminates the direct maintenance of labour and materials. But it also removes the effort required to plan, schedule, manage, and report on this work.
Second, change time-based overhaul or replacement tasks into condition-based tasks. Instead of replacing a component every so many hours, use a condition monitoring technique to assess how much life the component has left. And only replace the component when actually required.
And third, extend task intervals. Do this based on data analysis, operator and maintainer experience. Or simply on good engineering judgment. Remember to observe the results.
The shorter the current interval, the greater the impact when extending that interval. For example, adjusting a daily task to weekly reduces the required PM workload for that task by more than 80%.
This is often the simplest and one of the most effective improvements you can make.
Before I wrap up this article, I wanted to answer some of the most common FAQs relating to reliability centered maintenance, and these are:
What is reliability-centered maintenance?
Reliability-centered maintenance (RCM) is an internationally defined, structured decision-making process to develop or optimise a Preventive Maintenance Program. RCM focuses on preserving system functions rather than preserving equipment. The most fundamental requirement of any RCM process is that it must adequately and completely answer the following seven questions:
1. What are the functions and associated design performance standards for the asset in its current operating context?
2. In what way can the asset fail to fulfill its functions?
3. What causes each possible functional failure?
4. What happens when each function of failure occurs.
5. In what way does each failure matter?
6. What should we do to predict or prevent each failure?
7. What should we do if a suitable proactive task cannot be found?
When done well, RCM will deliver highly effective and efficient PM programs, but implementing RCM requires significant expertise and resources. So you need to use it wisely.
RCM is defined through a set of international standards: SAE JA1011 titled “Evaluation Criteria for Reliability Centered Maintenance Processes” and SAE standard JA1012 titled “A Guide to The Reliability Centered Maintenance Standard”.
What is the difference between RCM and FMEA?
Failure Mode Effects Analysis (FMEA) is a step in the reliability centered maintenance (RCM) process, but RCM does a lot more than just analysing functional failures. It focuses on defining functions, being clear on the operating context and selecting the right maintenance tasks based on the analysis of the different failure modes.
I like to say that RCM is function-based, Preventive Maintenance Optimisation (PMO) is task-based and FMEA is equipment based, but all good analyses are failure mode based!
What are the types of reliability-centered maintenance?
Be very careful with the idea that there are ‘types of reliability centered maintenance’. People do talk about classical RCM and accelerated RCM. Classical RCM is the RCM process as originally defined by Nowland & Heap and now documented in SAE JA1011 and SAE JA1012. Accelerated RCM is an adaption of the classical RCM process, and there are quite a few variations – some are robust but others are not. Buyer beware!
What is the overall goal of RCM?
The overall goal of reliability-centered maintenance is to achieve the required reliability levels for a system, at optimised maintenance and cost levels by focusing on the preservation of key functions.
I wrote this article based on a number of key sources listed below (and throughout the article). I strongly recommend getting yourself a copy of Moubray’s book on Reliability Centered Maintenance if don’t already own a copy. And I’d definitely get the NAVSEA Reliability Centered Maintenance (RCM) manual as it’s well-written and easy to understand:
- Moubray, J. (1997) Reliability Centered Maintenance Second Edition. Industrial Press. Available at: https://www.amazon.com/Reliability-Centered-Maintenance-Second-John-Moubray/dp/0831131462.
- NAVSEA (2007) Reliability Centered Maintenance (RCM) Handbook [S9081-AB-GIB-010]. Available at: https://www.amazon.com/NAVSEA-Reliability-Centered-Maintenance-RCM-Handbook-ebook/dp/B00U1UJPKK.
- Allen, T. M. (2001) ‘U.S. Navy Analysis of Submarine Maintenance Data and the Development of Age and Reliability Profiles’. Available at: http://www.plant-maintenance.com/articles/SubmarineMaintenanceDataRCM.pdf.
- White Paper (no date) ‘What is Reliability Centered Maintenance?’ Available at: https://www.mainsaver.com/pdf/Reliability_Centered_Maintenance_White_Paper.pdf.
- Wikipedia (2017) Reliability-centered maintenance. Available at: https://en.wikipedia.org/wiki/Reliability-centered_maintenance
- NASA (2008) Reliability-Centered Maintenance Guide. Available at https://www.nasa.gov/sites/default/files/atoms/files/nasa_rcmguide.pdf
Feel free to share this RCM Infographic on your site, all we ask is that you include an attribution to https://roadtoreliability.com
Hi Erik! I must say it was informative and concise article on this very demanding subject. When it comes to take senior management on board, I found cost based analytical approach, to establish if a maintenance task or a design change worth doing, an effective tool.
Thanks Waseem. You’re absolutely right, if we can express the benefits of what we do in money (either as a cost saving or as production increase) it is much easier to get senior management on board.
Nice to note your efforts to improve the awareness for increased productivity through RCM.
The article has covered all the major area where normally people do err in the strategy.
I heard RCM performed in PDO yields the expected results.
Any project in RCM – please feel free to contact.
Dr Edwin Browne at Dr.Edwin.R.Browne@gmail.com
Very interesting and useful article Erik and helped me a lot to figure out how to optimize the Preventive Maintenance tasks in my company since most of them are wasting for resources.
Thanks a lot.
Hi Waleed, thanks for your feedback. Many PM programs are wasteful of resources, yours is definately not the only one. Let us know how you go with improving it.
I worked mainly in turnaround projects but i would like to know the opinion of maintenance experts on below 5 points.
1) Maintenance plans in CMMS Shall be complete workpacks for preventive and predictive task (Only sceduling no planning)
2) Avoid not necessary or too frequent task of preventive maintenance. Identify categorize and focus in CMMS for SCM (safety critical maint.) and OCM (operation critical maint.)
3) schedule efficiently (direct extract from Sap the wo tasks to schedule) and with a sufficient level of details in work order operation tasks in order to assess progress for each work centre and identify simile
4) populate and uodate correctly cmms in order to plan work orders and assure a database for maintenance engineering. Moving from oreda data to company data for ram and rcm analysis (each Company is unique)
5)assure competente resource for maintenance planning and execution and have good contracts (E.g include kpi, exhaustive sow, etc.)
Hi Dario, thanks for your comment, probably better suited with one of the planning & scheduling articles but that’s ok. I can’t answer all the queries here in a single comment, but for number (1) I agree that most of the work in CMMS should be fully planned, PM, PDM and CM. But even some PM’s can’t be fully scoped. For example in a complex petrochemical plant, you could end up doing major overhauls or inspections (during a shutdown) based on condition assessments which determine the final scope of work. This can never be fully finished in your CMMS and will require the planner to fine-tune the scope of work before the workorder & workpack is completed and issued.
Thank you Erik for this informative article. I am more into Reliability Engineering , the best approach for me has been to carry out an equipment criticality analysis, FMEA and developing maintenance tactics , analyze these to see which failure modes they address and take it from there. Failures will occur , but they can be mitigated. I also have a very useful manual – Rules of Thumb for Maintenance and Reliability Engineers.
Hi Nigel, thanks for your comment. Sounds like you have an effective approach in place and indeed “Rules of Thumb for Maintenance and Reliability Engineers” is a great book by Ricky Smith!
very interesting and precious book
Many thanks MR Erik Hupjé
Good one Eric. Very much pleased with the simple and concise manner of the article. My area of challenge is on developing a good maintenance programme. I really observed that most of maintenance program being carried out are not really value adding as it doesn’t prevent failure. it was beacuse most of the programme were lifted from OEM manual without doing proper RCM study on that asset. Most of the maintenance tasks were not preventing any failure mode and yet they are being carried out. The big question now becomes, “why are we still having so many equipment failures and yet we carryout preventive maintenance activities”?
I suggested carrying out PMO to remove non value adding tasks and probably replace with value added ones to improve equipment reliability. Gradually, things are changing.
Thanks for your comment Jude – great to hear that you initiated preventive maintenance optimisation in your plant and that you’re starting to see the results come through.
A great article. I am very impressed with your write-ups. Since I have started following you on Linkedin, I am agreeing you on almost every occasion. This is one of that occasion. Article is just great no other words to explain.
Infact in my few years of experience, I have faced the problems mentioned above mainly the last one i.e. “it will take only 5 minutes more” Those 5 minutes become 5 hours without knowing.
People since their education have been taught that PM is a necessity. We buy car & they ask us to replace oil every XXXX kms & we do it religiously. We buy AC & they ask us to get it serviced before & after summer. So the PM mentality makes a home in our mind & slowly from necessary it takes the form of NECESSARY EVIL. In my few years in the field, I have understood that more you maintain, more the chances of equipment failure will be there. Do not unnecessarily stop the equipment just to do a PM. Rather do a PdM to avoid start/ Stop (Biggest reason to make an equipment stressed in my opinion).
Once again Thanks again Eric for increasing our knowledge.
Hope to see you some place
Thanks! Gald to hear you enjoy the articles and that you recognise the issues based on your own experience. It’s a small world so who knows, we might one day meet indeed!
Thanks a lot Eric for your research and your article for Modern Maintenance I´ll appreciate it and it´will be forwarded to my maintenance managers
Good article explain modern concepts of engineering management, and the effect of use modern maintenance to repair engineering problems.
Excellent article, very well compiled. Especially the 9 principles made a very logical and interesting reading. Worth reading for every maintenance professional.
Thank you Deelip, glad you enjoyed it
Thanks for sharing your vision. This is a very good article, with historical references. I would like your permission to translate it into Spanish and publish it in the 21st edition of the Confiabilidad Industrial magazine. (www.confiabilidad.com.ve) Again thanks.
Sure David, please drop me an email at firstname.lastname@example.org so we can discuss practicalities and any help you might need.
I enjoy reading every line of this article,very informative and I am hoping that someday soon I will have the opportunity to develop and roll out a properly structured maintenance plan.thanks once again
Glad you enjoyed it! When the time comes feel free to reach out if you need help
Thanks Erik for sharing this article. That includes great history of RCM as well as 9 key principles of modern maintenance (Principles of RCM). It worths publishing in a good journal. Thanks Erik.
Thanks Hesam. I hope to grow https://roadtoreliability.com grow to the equivalent of a good journal!
Hi Erik, It is a good article.I have experienced many time the OEM use a component on an equipment which is designed to fail , shaft dia bearing type and size.There can be two theories one that the component failure prevents major damage or to increase the sale of components.Since most of the OEM do not give specification it becomes quite a task to redesign knowing fully well that the component used is of wrong design.
Yes, in the last four years whean I was working in a copper open pit mine; the maintenance develoment was very important for improvement all task maintenance in any machine or equipment but this is not the end, because the same prinicipies, we apply in other segments like civil construction, steel construction, hydraulic systems.
WorKing in base a maintenance program plannig and use other maintenance technical activities like maintenance predictive; can be of grate help.
And if we looking more; we can use the quality management system for measure and improvement the maintenance so much more.
You should also mention Predictive Maintenance SW that are become today popular.
see for example http://www.precog.co
Thanks for your comment Aviel. I think any predictive maintenance solution like your Precognize needs to be built on these key principles. But I agree that we are probably on the brink of a new Generation of Maintenance (the 5th depending on how you look at it) that will be heavily influenced by IIoT and AI… but the Big Data approach will just lead to Big Problems if we lose sight of the basics that underpin maintenance.
I did enjoy the article. Very informative.
Very nice and informative article. One basic principle of maintenance is not considered here i.e. Believe in your maintenance program and be focused. It is very common that maintenance programs are hijacked by production team and their prioties are entirely different and are offen more influential. This pushes maintenance to divert resources to non-critical and non- important tasks..
I enjoy reading your summary article.
Reliability basics is all about so called ” Enablers” that build and motivate health.
We create ever more complexity into our machines and systems designed by highly competent individuals who look at various operational requirement aspects and/or windows of improvement opportunity, Great way , but the technicians, engineers, Operators or even the product support is mostly not privi to the knowledge and/or skill to support or enable himself to operate/ support the “new” product and only learns with time, failure prevention starts at day ONE , NOT once it happens.
This summarise’s not only to have a effective operational readiness program, but an enabling (here all inputs and outputs need to coordinate) “health program” understood by all levels of work top to bottom.
Thanks for those educative articles, i have really learned a lot.
I work in water and sewage stations on 100 acres and more than 1000 units
In the beginning a great article can add additional items if authorized by me
1. Preventive maintenance is not a constitution and can be changed as needed
2. Poor operation is the major problem and failure to take action before, during and after operation
3 – Senior management and its correlation with the cost of maintenance with production
Hi Sir Erik,
Great article, I learned a lot from the history of maintenance until present and I can now say that we are still practicing the 2nd generation maintenance. Hope someday I could fully introduced the RCM and proactive approach on our maintenance practices.
Keep going, you are helping a lot for us in the plant maintenance industry
Hi Erik. Great, yet simple way to help explaining many non-maintenance professionals about the basics of good, proactive maintenance programs (excellent to be used amongst proejcts engineers with very limited maintenance understanding).
One minor view: On PRINCIPLE 5 I suggest you clarify that Hidden failures must be found as long as these are Critical. Agree that most hidden failures are critical but not 100% of these are, hence the effort to find ALL, including the not-so-critical might hinder the PM program, e.g. manual valve fail to close due to debris (most manual valves are not critical to the process design).
Thanks for the time taken to complile this article …..wise PM thoughts!.
Hi Carlos, thanks for your comment. You’re 100% right: we don’t want to looking for hidden failures that don’t really matter to our business. If we did we would be wasting our valuable resources on PM tasks that don’t really matter. In doing so we’d be violating Principle #8. Thanks again, I’ll update the article to make this more clear.
Thanks Eric and is really useful history of maintenance where I realized many people still agnorant about it.
Thank you Mohammed, we need to make a conscious effort to help those around us understand these basic principles of maintenance.
Thank you for writing this article Erik, it was very well written and easy to read. The history of maintenance is quite interesting, funny to see how the pendulum swung too far into the preventative maintenance direction from doing no preventative maint. It will be interesting to see if machine learning in the future could shake this up and eventually model accurate predictions of when random failures occur.
Thank you for a detailed explanation of all generations of maintenance. Also, I am thankful to you for mentioning 9 principles.
The myth of zero breakdowns has to be broken. These word looks good in some vision-mission statement but is not realistic and practical. We have to expect and accept some failures which are out of scope best maintenance plan. Not everything can be checked, tested in every condition (running or idle, slow or full speed). So not everything can be guaranteed.
This thinking is to be developed in the production department ( who owns equipment) in case of major MNCs. Our main motto should be INCREASED RELIABILITY only.
I am also of the opinion of having a different kind of approach for the different type of failures and equipment. As in the case of the furnace which we have, we can’t afford to have a stoppage of the auto cycle which will lead to massive scrap in costing in millions. So there has to be a great change approach when we are handling batch producing machinery versus single part producing.
Also nowadays IATF asking for a proactive approach for an overhaul of the machine. So the machine is to be checked for actions against daily PMI, preventive maintenance, predictive maintenance and proactive overhaul. I think this will majorly satisfy basic maintenance needs.
Thank you for the comment Nakul
Excellent Eric, During my tenure I missed most of the points. I will implement now.
Very nice and very informative. Keep sharing your knowledge.
Well done as always Erik. Great summary of history and the target state components that must be addressed. I find your writing good to keep me grounded in where I need to guide organizations.
The challenge in 2019 is how to be successful in the short, medium and long term. Today’s practitioners have different rules than those already successful with RCM. If you look at successful programs you will see 2 common threads: 1) a great leader with a long-view; 2) an organization is committed to 3-5 years of negative returns to set the foundation for decades of success. Most often today’s managers have neither of these. They have plant managers, VPs and Presidents that grew up in sales or finance not via shop floor.
They challenge managers for month over month improvement. Maintenance managers today would be fired with 6 months of negative results, We even call the effort a “journey”.
The reliability community of consultants either worked for a company that took on this journey years ago or has studied those successful. What we need to realize is that the rules have changed. Kind of like the lessons of WWI on the western front: you can’t fight today’s war with what was successful in the past. The advent of the machine gun made the bayonet charge suicide. The reliability consultants must come up with a solution for today’s “battle”.
I think the solution is to add “Lean” to our tool box and to use it first. I detail it in my YouTube videos titled “Reliability Man”. I’m sure there are other answers, but I do feel lonely offering something different.
Erik, Thanks for sharing this very informative article on basics of RCM.
I would suggest you to include the 10th point in your list on the ‘development of detailed procedures for the maintenance tasks’. In my experience, this is also a major reason for the failure of RCM, that is in the implementation side. You could do a very good RCM study, but if you have generic procedures, you would end up with the implementation failure.
Dear Mr Erik, I would like to thank you for the great article. The way how you explained the evolution of maintenance and then passing to the RCM principals was so good and kept me eager to finish the article until the end. Your words were very simple and clear.
Please keep up on posting the good articles and I would like to ask you to publish some real cases on applying RCM so people can understand the impact and can apply it. Because for me as a maintenance planner I’ve understood the concept and the benefit on asset availability and reliability but I’m struggling how and from where to start performing PMR, FMEA, ..etc.
Wow, Eric, this is superb!
Permit me to used this article as my presentation in my company.
I’ve really learn a lot. Thank you sir.
Thank you Nsukky, sure please feel free to use the article, but if you could reference the article as the source that would be appreciated.
Thanks for this very well researched and well written article. Being relatively new to the subject and responsible for our plant CMMS we continually learn from our mistakes or assumptions. We have solved some of the reliability issues with our PM’s on our critical assets ( show stoppers) but we still have unexpected downtime in our plant for the unseen and often forgotten things like display screens and controls. The thing about these screens is that one day they are working just fine and the next they fail and bring down the plant. Its not always possible to have a spare replacement display.
I do believe the real value in a maintenance program is to minimize your downtime by being proactive spending the same amount on your maintenance costs but experiencing less unplanned repairs.
Thanks for your comment David. A maintenance program built on these principles will certainly help to reduce downtime and ensure efficient use of your resource for those maintenance tasks that really add value. One thing to keep in mind is that sometimes we are faced with inherent defects in our plants and these we must eliminate, no amount of PM will help.
Hi dear Erik
You have written great.
One other item that will
Have a big impact on repairs and maintenances are persons how are they ready to do a great job.
no body can do always great, so I think not just equipment also persons must be in good mood to do good job. Many big events happened for very small mistakes based on operations and operators.
Hi Hassan, thank you for your comment. You are totally right that motivation is a key contributor to a good performance and this is why leadership and culture and an important part of the Road to Reliability. With good leadership you can develop a positive culture and get the best out of your people.
You have clearly explained the pass way to have efficient and productive maintenance on plant.
There are always challenges in every organizations to implement and execute all these principals and culture and leadership would be one lubrication and helpful for implementation.
I would suggest to also explain how to gather all teams’ attentions and KPI’s together to deliver this plan otherwise these principals are like one book in shelf without actual benefit.
even if the facts are known already for a long time, your article is fun to read and a good summary on why to do things! Lots of CMMS/EAM implementations are primarily ‘IT-driven’ (implement vendor’s features and functions and Do Things Right) and therefore maintenance departments get a more intelligent EXCEL-spreadsheet (Sorry Microsoft!) to replicate aging processes instead of implementing an custom-driven optimized organisation and respective processes. The shorter half-life of today’s management and neglectance of maintenance department’s mid/long-term success is only one of the obstacles to find ways of Doing The Right Things! Once again – thanks for your book! It is never outdated and still neccessary.
Thank you Hagen.
I like the DC8 vs DC10 slide.
Thanks Andrew, pretty amazing result isn’t it?
Thanks sir for the very important article and the manner that you give us the history of maintenance used and developed in the other domains aeronautical engineering and marine corps, thanks a lot.
You’re very welcome. I hope you found the article useful.
Excellent article. I wish more maintenance managers (and their managers) could be made to understand this. Over my career I have seen many wasted efforts and non-evidence based approaches to maintenance. A lot of these have been a result of senior management who don’t understand what maintenance is all about and maintenance managers who can’t communicate this effectively.
Hi Mike, thanks for your feedback. What you raise is exactly one of the goals of Road to Reliability: to influence how senior management see maintenance and help maintenance managers communicate more effectively to their management.
I agree with Mike.Some times it’s difficult to influence senior management due to long hierarchy of approvals and involvement of multiple departments for small change in complex organization.
But it’s always good to put self effort to achieve self satisfaction and if it goes run then it will be great for organization and for self learning.
This is true. Unfortnately in most organisations the decision makers are non technical so they do not value or prioritise maintenance. Some even ask you that ´why fix it if its not broken´. There is really need to find a way to make the non technical managers understand the value of maintenance and the benefits of moving with technological trends in maintentance.
One seemingly minor defect picked during maintenance can save a whole drive train and plant at large.
This is excellent. It will make a lot of difference in the results achieved.
Excellent article. I total agree with Mike Cook when it comes to management decision on equipment maintenance. I wish a lot of maintenance manager and their ups managers understand this excellent article. I have seen many wasted efforts even up till now. I need to secure my career opportunity in the future where I can utilize my diverse experience in the industry. Provide the energy people need in a reliable and sustainable method in an environment where I will be more expose and get more people to Join me in driving the change in maintenance organization.
Excellent article. Well Done Sir
Thank you – please feel free to share the article
Very interresting article. When configurations are as accurate as possible you have some more profits. I think (and not only me) is that your basics has to be right (accurate) which starts with very accurate configurations.
Nicely explained, how to save scarce resources.
Thank Deepak – if you’re interested in saving resources don’t forget to read the articles on planning & scheduling.
Very use full article certainly for those young reliability engineers who need that simple explained information. As you stated in your text above, we intend to do way to much maintenance. First define your criticality of your equipment before your define the type of maintenance performed even consider if necessary at all.
So again a great article.
Great article Eric. I was interested to hear the history behind it.
I moved to a company 3.5 years ago and after implementing the techniques mentioned above we have drastically improved reliability, although there is lots more to do.
Changing the mindset is often the biggest challenge.
Very useful article even in my field which deals with non mechanical items and their failure. Some of the similarities and terminology was very educating.
Hi, a nice clear read.
Principal #6 is all about context, maintain for the consequences driven by the context the equipment is placed in. A good example is, would you maintain the brakes on a van or an ambulance any differently? Or, would you check the starting system on them any differently? Clearly in the first the safety consequences are the same regardless, whereas in the second the consequences are different and thus may have different preventative maintenance. In both cases the equipment could be made from identical components.
Thanks Mike indeed Principles 6 is all about operating context. Like your comparison of a normal van vs an ambulance.
Thanks Erik, it’s my 1st time to ready your article and I found out that some of the the principles you mentioned saves the company’s scarce resources and minimise downtime. I’m surely going to implement some of these principles.
Great stuff! It nicely sums up the Art of Maintenance and Reliability. I would be cautious against generalizing with the following quote
“Similarly, avoid assigning multiple tasks to a single failure mode. It’s wasteful and it makes it hard to determine which task is actually effective. Stick to the rule of a single, effective task per failure mode.”
After more than thirty years of aviation maintenance and reliability experience I know of plenty of cases where doing more than one single maintenance task per failure mode is required to obtain the best operational reliability and life cycle cost for the asset. It all boils down to analyzing the failure mode and its maintenance cost vs the benefits. I have seen too often where Quotes like these steer management into false assumptions on how this all works.
Hi Hans, thanks for your comment and feedback. I’ll tweak the text to make it clear that it should be seen as good practice to stick to one task per failure mode, but that there can be exceptions, especially where the consequences of the failure mode are very significant. It’s a fine balance between keeping things as simple as possible and generalizing too much.
Erik, Aladon RCM2 and now RCM3 has always catered for this specific situation through the specific logic on the decision diagrams
I focus on Overhaul, Repair Overhaul of Industrial Rotodynamic Machinery as a vendor.
Many of many customers carried out diagnostic checks and produce fantastic reports. As overhauler, we often can see a direct connection of defects and wears to the data’s from these diagnostic checks, and see the weaknesses or “sickness” or “diseases” in each particular equipment. These findings and experiences normally do not reach to those maintaining the equipment or managing the RCM system.
Recently some of my customers tap on my knowledge and experiences to eliminate chronic issues that they had for years.
Many of these failures actually can be traced to vendors (often the lowest charges) who practice only focus in parts changing and little measurements to check for imperfections of components.
Some equipment failures resulted from the systems of which the equipment are related to.
“These findings and experiences normally do not reach to those maintaining the equipment or managing the RCM system.”
Indeed a common problem but a clear example of a broken process – what is the point of doing these checks if the people who own the strategy don’t see the outcomes and results of their strategy? Maintenance needs a continuous improvement process and that means closing the loop.
Great Article! Worth Reading…
Absolutely fantastic read……. We had just started to look at this when out terminal was taken over…. I must get my new employer to get back on board..
Thank you so much for this useful article. By knowing and applying these tips , definately plants downtime can be curtailed and be reduced.
It’s funny. I lived with RCM and facilitated RCM workshop for many years. But this is my first time knowing RCM history and reaIized misunderstanding RCM was originated from military. Actually, I also develop RCM software based on FMECA with quantitative appoach using Weirbull to automatic determine interval and maintenance practice. But I had been stuck in automatice report. So I stopped for a while. . Now I have already had data science people. I will continue my RCM soonest.
Thanks for this concise and applicable article. I wonder if you could tell me that how can we optimize the PM tasks and especially the scheduled main overhaul on Gas turbines.
Please share the references where you have implemented this RCM
Thanks for your comment Askari, I have sent you an email.
Hi Erik, I am so glad you released this “walkthrough” around the benefice of RCM. The Q/A you received demonstrates a confidential utilization still requiring support to be understood by more and more individuals. I was and is still a supporter of this methodology when in charge of various Maintenance Organizations involved in manufacturing, design and integration. I was even proud to have met with John MOUBRAY when we both considered an extended version to be called RCM II. At that time, I was working on a mathematical model called Reliability Growth Model which became lately MIL-HDBK-189c (SPLAN & SSPLAN). Our intent was to “plug-in” a “simple .xls software” called NRG II. The phonetic version of this name was in fact “to”. Our idea was to promote “RCM II improve Direct Maintenance Cost, Total Cost of Ownership, Dispatch Reliability, … name it”. But John passed away before we could finalize the idea. John and I were in fact taking the opportunity I was the Director of Maintenance Engineering of BOMBARDIER Aerospace and later on Transportation… . Very good souvenirs of success stories for all my colleagues and customers. Please keep on promoting this concept which preserves us against multiple AI initiatives which are progressively transforming us into “simple” actors of replacing component because the “AI application” told us to do so. Our “Brain” is our tool, we must preserve it and train it to control our destiny. Jean-Louis PEREE
Thank you for the comment Jean-Louis, I totally agree with you that even in a future full of AI and big data we will be still required to understand and apply the fundamentals.
Great. It’s really a worth to read blog. I will surely make this into consideration and make it as guide in our maintenance program.. Thanks Erik.
Thank you Rio
Great Article, very useful, i am going to implement in the plant !!!!
A great share with very help to me. Actually I am a learner like to study RCM and apply it into my job.
I seek some handbook and try to realise the theory. Months ago, I find a book, “Reliability Centered Maintenance (RCM): Implementation Made Simple”, witten by Neil B. Bloom, he is a previous coleague of John Moubray. But in many web site and books, no book or standard mention it as referense. I like to get comment from you that if this book is woth to be studied or not.
I’m not familiar with that book so I can’t really help in that respect. There are many books on RCM out there. I usually recommend Moubray’s book.
Erik, good article.
A point made by Heinz Bloch is that the best Organisations that truly understand reliability are the ones where maintenance and operations department work close together. So close in fact that at the drop of a hat, the maintenance manager swaps roles with the Ops manager for a period of time without any prior warning, enhancing the dynamic of the two departments to take a holistic approach to asset care.
Well done Erik as usual
I think with the development of IoT we will be more able to collect more data’s to better monitor the patterns of each component and I think with IoT we will move to a maintenance program driven by condition based maintenance .
Two of the most significant ways businesses benefit include:
Predictive maintenance: Retrofitting your products with IoT-enabled sensors offers your organization real-time insight around how your goods are operating. The moment your business detects an anomaly, it can address the issue immediately. This helps avoid product failures, increase uptime, and extend the product life cycle.
Live engineering: IoT isn’t only essential to the maintenance of existing products. It’s crucial to the engineering of future products as well. With real-time sensor data at their disposal, product engineers can correct flaws they’ve discovered in previously manufactured goods and design better products going forward
And all this will enable a better Maintenance decision making process
Thank you sir.
These things are natural and universal truth. I love these and help me lot to conduct training and in my research & development activities.
Very use full and imortant information whuch needs of Toady in the Manufacturing Industry.
Really like the article, simple to follow and covers all the subjects/issues well. Thanks for your insights and this very useful tools for our journey along the “Road to Reliability”
Very useful and very important information today situation really this information new for me i will apply this in my industry an improve the machine up time
Always enjoyable to review the history and why we should all embrace Reliability – well done!
Good write up but i think you have to include the competency of the operators as well, as we know that greater number of machines/equipment failures are as a result of human errors. Maybe not identifying the degradation of equipment on time or not doing the needful at the right time.
You should share e-Book on PM. Its a must.
Thank you Anand, in the next few weeks the website will be relaunch and all articles will become available as easy to download PDFs. And a book is being planned for for 2022…
Which is the best company in Europe/UK to aid an organization in implementing RCM for process facilities?
My advice Umar would be not to look for a specific company, but for an RCM facilitator that has a very solid, proven track record and who is happy to share references from past clients.