> Can we use clustering to determine the condition of the vehicle?
This analysis aims to ascertain the viability of utilizing clustering on variables to predict a vehicle's condition. We can summarize this analysis in the following statement.
> Can we use clustering analysis to determine the condition of the vehicle?
Some notable features about this data is that it is heavily skewed towards categorical data and less on numerical data. The overall approach taken is as follows:
1. Perform the clustering on the numerical data.
2. Rejoin the clustering output dataframes back to the original dataset.
2. Rejoin the clustering output dataframes back to the original dataset. So that we do not use condition as a predictor.
3. Then examine the results of vehicle condition by cluster
4. Run clustering over differernt variations of variables
I inspected the output of the clustering and contrasted it against the vehicle's conditions. The Agglomerative method did not produce a result that was useful at all. However, the Divisive method did produce something interesting in each run.
We inspected the output of the clustering and contrasted it against the vehicle's conditions. The Agglomerative method did not produce a result that was useful at all. However, the Divisive and kmeans method did produce something interesting in each run. If we look at the last run for variables `price`, `odometer`, and `year`, the Divisive ward algorithm with 3 clusters was a strong predictor of `excellent` condition vehicles. In addition, the kmeans algorithm with 3 clusters was a strong predictor of `new` condition vehicles. See output below.
For the ward_3 cluster 3 algorithm, we may use the group 2 to cluster `excellent` condition vehicles. For the kmeans cluster 3 algorithm, we may predict `new` condition vehicles.
If we look at the last run for variable price, odometer, and year. The Divisive ward algorithm with 3 clusters was a strong predictor of excellent, good, and like new condition vehicles. See output below.
```
ward_3
excellent fair good like new new salvage
4335 341 3423 455 32 13
excellent fair good like new new salvage
10479 129 5247 1778 39 18
group 1
excellent fair good like new new salvage
4335 341 3423 455 32 13
group 2
excellent fair good like new new salvage
10479 129 5247 1778 39 18
group 3
excellent fair good like new new salvage
2672 16 1074 1530 98 4
----
kmeans_3
group 1
excellent fair good like new new salvage
6118 42 2615 2312 111 13
group 2
excellent fair good like new new salvage
9643 229 5470 1286 57 12
group 3
excellent fair good like new new salvage
1725 215 1659 165 1 10
excellent fair good like new new salvage
2672 16 1074 1530 98 4
```
## Clustering Conclusion
While our dataset lacked an expansive range of continuous variables, the limited variables still played a crucial role in our clustering analysis. Among the various clustering algorithms tested, not all clustering algorithm proved entirely effective. Nonetheless, the Divisive Ward algorithm, specifically with three clusters, yielded a particularly intriguing outcome. This specific configuration of the Ward algorithm demonstrated a robust predictive capacity, especially in categorizing vehicles as being in excellent, good, or like-new conditions.
While our dataset lacked an expansive range of continuous variables, the limited variables still played a crucial role in our clustering analysis. Among the various clustering algorithms tested, not all clustering algorithm proved entirely effective. Nonetheless, the Divisive Ward and kmeans algorithm, specifically with three clusters, yielded a particularly intriguing outcome. These specific configuration of the algorithms demonstrated a robust predictive capacity, especially in categorizing vehicles as being in excellent or new condition vehicles.