Use R to answer the following clustering questions:
Here’s a dput of my CSV dataset I am using:
“`
structure(list(weld.type.ID = 1:33, weld.type =structure(c(29L,
11L, 16L, 4L, 28L, 17L, 19L, 5L, 24L, 27L, 21L, 32L, 12L,20L,
26L, 25L, 3L, 7L, 13L, 22L, 33L, 1L, 9L, 10L, 18L, 15L, 31L,
8L, 23L, 2L, 14L, 6L, 30L), .Label = c(“1,40,Material A”,”1,40S,Material C”,
“1,80,Material A”, “1,STD,Material A”, “1,XS,Material A”,”10,10S,Material C”,
“10,160,Material A”, “10,40,Material A”, “10,40S,Material C”,
“10,80,Material A”, “10,STD,Material A”, “10,XS,Material A”,
“13,40,Material A”, “13,40S,Material C”, “13,80,Material A”,
“13,STD,Material A”, “13,XS,Material A”, “14,40,Material A”,
“14,STD,Material A”, “14,XS,Material A”, “15,STD,Material A”,
“15,XS,Material A”, “2,10S,Material C”, “2,160,Material A”,”2,40,Material A”,
“2,40S,Material C”, “2,80,Material A”, “2,STD,Material A”,”2,XS,Material A”,
“4,80,Material A”, “4,STD,Material A”, “6,STD,Material A”,”6,XS,Material A”
), class = “factor”), alpha = c(281L, 196L, 59L, 96L, 442L,98L,
66L, 30L, 68L, 43L, 35L, 44L, 23L, 14L, 24L, 38L, 8L, 8L, 5L,
19L, 37L, 38L, 6L, 11L, 29L, 6L, 16L, 6L, 16L, 3L, 4L, 9L,12L
), beta = c(7194L, 4298L, 3457L, 2982L, 4280L, 3605L, 2229L,
1744L, 2234L, 1012L, 1096L, 1023L, 1461L, 1303L, 531L, 233L,
630L, 502L, 328L, 509L, 629L, 554L, 358L, 501L, 422L, 566L,403L,
211L, 159L, 268L, 167L, 140L, 621L), Median =c(0.0375507383753025,
0.043546015959685, 0.0166888869351212, 0.0310875876067419,0.0935470294716035,
0.0263798143584636, 0.0286213698125569, 0.0167296957822645,0.029403369311426,
0.0404683392593359, 0.0306699148693358, 0.0409507113292405,0.0152814823151512,
0.0103834693100336, 0.0426953962552843, 0.139335880048896,0.0120333156133183,
0.0150573864235556, 0.0140547965388361, 0.0354001989345449,0.0551110033888123,
0.0636987097619679, 0.0156058684578843, 0.0208640835981798,0.0636580207464108,
0.00992440459162821, 0.0374531528739036, 0.0262100640799903,
0.0898729525910631, 0.00989157442426205, 0.0215577154517479,
0.0584418091169483, 0.0184528408043719)), class = “data.frame”,row.names = c(NA,
-33L))
“`
For the K-Means clustering algorithm, determination of hyperparameter K is a common problem. The correct choice of K is often ambiguous, with interpretations depending on the shape and scale of the distribution of points in a dataset and desired clustering resolution of the user. In this step, you are asked to select the best cluster number based on the elbow method to further perform the K-means clustering approach. The error is defined by the distance between the cluster mean and the object that belongs to this cluster. In this step, the sum of squared errors (SSE) is used as the objective function value. Questions: (a) Visualize the relationship between objective function value (SSE) and cluster number K. (b) Select the best cluster number based on the elbow method. Show transcribed image text For the K-Means clustering algorithm, determination of hyperparameter K is a common problem. The correct choice of K is often ambiguous, with interpretations depending on the shape and scale of the distribution of points in a dataset and desired clustering resolution of the user. In this step, you are asked to select the best cluster number based on the elbow method to further perform the K-means clustering approach. The error is defined by the distance between the cluster mean and the object that belongs to this cluster. In this step, the sum of squared errors (SSE) is used as the objective function value. Questions: (a) Visualize the relationship between objective function value (SSE) and cluster number K. (b) Select the best cluster number based on the elbow method.
Expert Answer
Answer to Use R to answer the following clustering questions: Here’s a dput of my CSV dataset I am using: “` structure(list(weld….