This notebook will focus on the predictive modeling centered around machine maintenance and failure metrics. Using a dataset with a categorical failure observation, we will perform some EDA, followed by creating and tuning models, and ending with data visualization.
## Warning: package 'ggplot2' was built under R version 4.3.2
## Warning: package 'tidyr' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Warning: package 'tidymodels' was built under R version 4.3.2
## ── Attaching packages ────────────────────────────────────── tidymodels 1.2.0 ──
## ✔ broom 1.0.6 ✔ rsample 1.2.1
## ✔ dials 1.2.1 ✔ tune 1.2.1
## ✔ infer 1.0.7 ✔ workflows 1.1.4
## ✔ modeldata 1.4.0 ✔ workflowsets 1.1.0
## ✔ parsnip 1.2.1 ✔ yardstick 1.3.1
## ✔ recipes 1.0.10
## Warning: package 'broom' was built under R version 4.3.3
## Warning: package 'dials' was built under R version 4.3.2
## Warning: package 'infer' was built under R version 4.3.2
## Warning: package 'modeldata' was built under R version 4.3.3
## Warning: package 'parsnip' was built under R version 4.3.2
## Warning: package 'recipes' was built under R version 4.3.2
## Warning: package 'rsample' was built under R version 4.3.2
## Warning: package 'tune' was built under R version 4.3.2
## Warning: package 'workflows' was built under R version 4.3.2
## Warning: package 'workflowsets' was built under R version 4.3.2
## Warning: package 'yardstick' was built under R version 4.3.2
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## ✖ scales::discard() masks purrr::discard()
## ✖ dplyr::filter() masks stats::filter()
## ✖ recipes::fixed() masks stringr::fixed()
## ✖ dplyr::lag() masks stats::lag()
## ✖ yardstick::spec() masks readr::spec()
## ✖ recipes::step() masks stats::step()
## • Search for functions across packages at https://www.tidymodels.org/find/
## Loading required package: lattice
##
## Attaching package: 'caret'
##
## The following objects are masked from 'package:yardstick':
##
## precision, recall, sensitivity, specificity
##
## The following object is masked from 'package:purrr':
##
## lift
machine <- read_csv("Downloads/predictive_maintenance.csv")
## Rows: 10000 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Product ID, Type, Failure Type
## dbl (7): UDI, Air temperature [K], Process temperature [K], Rotational speed...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
colnames(machine)
## [1] "UDI" "Product ID"
## [3] "Type" "Air temperature [K]"
## [5] "Process temperature [K]" "Rotational speed [rpm]"
## [7] "Torque [Nm]" "Tool wear [min]"
## [9] "Target" "Failure Type"
summary(machine)
## UDI Product ID Type Air temperature [K]
## Min. : 1 Length:10000 Length:10000 Min. :295.3
## 1st Qu.: 2501 Class :character Class :character 1st Qu.:298.3
## Median : 5000 Mode :character Mode :character Median :300.1
## Mean : 5000 Mean :300.0
## 3rd Qu.: 7500 3rd Qu.:301.5
## Max. :10000 Max. :304.5
## Process temperature [K] Rotational speed [rpm] Torque [Nm] Tool wear [min]
## Min. :305.7 Min. :1168 Min. : 3.80 Min. : 0
## 1st Qu.:308.8 1st Qu.:1423 1st Qu.:33.20 1st Qu.: 53
## Median :310.1 Median :1503 Median :40.10 Median :108
## Mean :310.0 Mean :1539 Mean :39.99 Mean :108
## 3rd Qu.:311.1 3rd Qu.:1612 3rd Qu.:46.80 3rd Qu.:162
## Max. :313.8 Max. :2886 Max. :76.60 Max. :253
## Target Failure Type
## Min. :0.0000 Length:10000
## 1st Qu.:0.0000 Class :character
## Median :0.0000 Mode :character
## Mean :0.0339
## 3rd Qu.:0.0000
## Max. :1.0000
sum(is.na(machine))
## [1] 0
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.3.3
## corrplot 0.94 loaded
machine %>%
select_if(is.numeric) %>%
drop_na() %>%
cor() %>%
corrplot(addCoef.col = T)
It is clear to see the negative correlation between rotational speed and
torque. This is due to the fact that increased torque requires decreased
rotational speed. We can also see a positive correlation between process
temp and air temp.
machine %>%
ggplot(aes(`Torque [Nm]`, `Rotational speed [rpm]`))+
geom_point()+
geom_smooth(method= 'lm')
## `geom_smooth()` using formula = 'y ~ x'
table(machine$`Failure Type`)
##
## Heat Dissipation Failure No Failure Overstrain Failure
## 112 9652 78
## Power Failure Random Failures Tool Wear Failure
## 95 18 45
machine2 <- machine %>%
janitor::clean_names()%>%
select(-udi, -product_id,-rotational_speed_rpm, -air_temperature_k)
head(machine2)
## # A tibble: 6 × 6
## type process_temperature_k torque_nm tool_wear_min target failure_type
## <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 M 309. 42.8 0 0 No Failure
## 2 L 309. 46.3 3 0 No Failure
## 3 L 308. 49.4 5 0 No Failure
## 4 L 309. 39.5 7 0 No Failure
## 5 L 309. 40 9 0 No Failure
## 6 M 309. 41.9 11 0 No Failure
summary_table <- machine2 %>%
group_by(failure_type) %>%
summarise(process_temp_avg = mean(process_temperature_k),
torque_avg = mean(torque_nm),
tool_wear_min_avg = mean(tool_wear_min))
ggplot(summary_table, aes(failure_type, process_temp_avg))+
geom_col(aes(fill = failure_type))+
ggtext::geom_richtext(aes(label = round(process_temp_avg, 2)), alpha = 0.7,size = 6)+
ggtitle('Process Temperature per Failure')+
ylab("Temperature [K]")+
theme_bw()+
theme(axis.title.x = element_blank(),
plot.title = element_text(hjust = 0.5),
text = element_text(size = 9),
legend.title = element_blank())
As we can see, the temperature during failure does not drastically vary
from one failure to another. With this knowledge, we will know to not
spend too much time looking into temperature changes as it relates to
correlation to certain failure types.
ggplot(summary_table, aes(failure_type, tool_wear_min_avg))+
geom_col(aes(fill = failure_type))+
ggtext::geom_richtext(aes(label = round(process_temp_avg, 2)), alpha = 0.7,size = 6)+
ggtitle('Tool Wear Avg per Failure')+
ylab("Tool Wear [min]")+
theme_bw()+
theme(axis.title.x = element_blank(),
plot.title = element_text(hjust = 0.5),
text = element_text(size = 9),
legend.title = element_blank())
ggplot(summary_table, aes(failure_type, torque_avg))+
geom_col(aes(fill = failure_type))+
ggtext::geom_richtext(aes(label = round(process_temp_avg, 2)), alpha = 0.7,size = 6)+
ggtitle('Torque Avg per Failure')+
ylab("Torque [Nm]")+
theme_bw()+
theme(axis.title.x = element_blank(),
plot.title = element_text(hjust = 0.5),
text = element_text(size = 9),
legend.title = element_blank())