This analysis aims to ascertain the viability of utilizing clustering on variables to predict a vehicle's condition. We can summarize this analysis in the following statement.
This analysis aims to ascertain the viability of utilizing clustering on variables to predict a vehicle's condition. We
can summarize this analysis in the following statement.
> Can we use clustering analysis to determine the condition of the vehicle?
Some notable features about this data is that it is heavily skewed towards categorical data and less on numerical data. The overall approach taken is as follows:
Some notable features about this data is that it is heavily skewed towards categorical data and less on numerical data.
The overall approach taken is as follows:
1. Perform the clustering on the numerical data.
2. Rejoin the clustering output dataframes back to the original dataset. So that we do not use condition as a predictor.
3. Then examine the results of vehicle condition by cluster
2. Rejoin the clustering output dataframes back to the original dataset. So that we do not use condition as a
predictor.
3. Then examine the results of vehicle condition by cluster
4. Run clustering over differernt variations of variables
We inspected the output of the clustering and contrasted it against the vehicle's conditions. The Agglomerative method did not produce a result that was useful at all. However, the Divisive and kmeans method did produce something interesting in each run. If we look at the last run for variables `price`, `odometer`, and `year`, the Divisive ward algorithm with 3 clusters was a strong predictor of `excellent` condition vehicles. In addition, the kmeans algorithm with 3 clusters was a strong predictor of `new` condition vehicles. See output below.
We inspected the output of the clustering and contrasted it against the vehicle's conditions. The Agglomerative method
did not produce a result that was useful at all. However, the Divisive and kmeans method did produce something
interesting in each run. If we look at the last run for variables `price`, `odometer`, and `year`, the Divisive ward
algorithm with 3 clusters was a strong predictor of `excellent` condition vehicles. In addition, the kmeans algorithm
with 3 clusters was a strong predictor of `new` condition vehicles. See output below.
For the ward_3 cluster 3 algorithm, we may use the group 2 to cluster `excellent` condition vehicles. For the kmeans cluster 3 algorithm, we may predict `new` condition vehicles.
For the ward_3 cluster 3 algorithm, we may use the group 2 to cluster `excellent` condition vehicles. For the kmeans
cluster 3 algorithm, we may predict `new` condition vehicles. Collapsing the `condition` variable did not produce a more
accurate reading.
```
```
ward_3
group 1
...
...
@@ -279,14 +302,12 @@ group 2
group 3
excellent fair good like new new salvage
1725 215 1659 165 1 10
```
## Clustering Conclusion
While our dataset lacked an expansive range of continuous variables, the limited variables still played a crucial role in our clustering analysis. Among the various clustering algorithms tested, not all clustering algorithm proved entirely effective. Nonetheless, the Divisive Ward and kmeans algorithm, specifically with three clusters, yielded a particularly intriguing outcome. These specific configuration of the algorithms demonstrated a robust predictive capacity, especially in categorizing vehicles as being in excellent or new condition vehicles.
## Clustering Conclusion
While our dataset lacked an expansive range of continuous variables, the limited variables still played a crucial role
in our clustering analysis. Among the various clustering algorithms tested, not all clustering algorithm proved entirely
effective. Nonetheless, the Divisive Ward and kmeans algorithm, specifically with three clusters, yielded a particularly
intriguing outcome. These specific configuration of the algorithms demonstrated a robust predictive capacity, especially
in categorizing vehicles as being in excellent or new condition vehicles.