KMeans is an unsupervised machine learning model that predicts which cluster that the input belongs to using distance.
Code Sample
local featureMatrix = {
{-900, -700},
{-400, -600},
{-500, -300},
{900, 900},
{100, 100},
{100, 100},
{300, 400},
{100, 500},
{-50, -600}
}
local testFeatureMatrix = {
{-300, -500},
}
local modelParameters, cost = MachineLL.KMeans:train(featureMatrix, nil, nil, nil, nil, false, false, false, false)
local clusterNumber, shortestDistance = MachineLL.KMeans:predict(testFeatureMatrix, modelParameters)
Functions
:train()
train(featureMatrix: matrix, numberOfClusters: int, maxNumberOfIterations: int, learningRate: number, distanceFunction: string, targetCost: number, setInitialClustersOnDataPoints: boolean, setTheCentroidsDistanceFarthest: boolean, stopWhenModelParametersDoesNotChange: boolean, suppressOutput: boolean): matrix, number
Arguments:
featureMatrix
: The matrix containing values for the model to train onnumberOfClusters
: Number of clusters for model to train and predict onmaxNumberOfIterations
: Maximum number of iterationslearningRate
: The learning rate for the model (values between 0 and 1 is recommended)distanceFunction
: The function that the model will use to train. distanceFunction available are “euclidean” and “manhattan“targetCost: The target cost for the model to stop training
setInitialClustersOnDataPoints
: Set whether or not the model to create centroids on any data pointssetTheCentroidsDistanceFarthest
: Set whether or not the model to create centroids that are furthest from each otherstopWhenModelParametersDoesNotChange
: Stop the training if the model parameters does not change from the previous iterationsuppressOutput
: An option whether or not to display the number of iterations and the cost
:predict()
predict(featureMatrix: matrix, modelParameters: matrix, distanceFunction: string): number, number
Arguments:
featureMatrix
: The matrix containing values for the model to predict onmodelParameters
: The matrix generated from training the modeldistanceFunction
: The function that the model will use to train. distanceFunction available are “euclidean” and “manhattan“
Notes:
if
setInitialClustersOnDataPoints
andsetTheCentroidsDistanceFarthest
are set totrue
, expect performance issues when the model initializes for a period of time based on the number of data givenEnsure that the
distanceFunction
used in both training and predicting are the same for the best accuracy