## Publisher's description

The Waffles project offers a collection of command-line tools for researchers in machine learning, Data Mining, and related fields. All of the functionality is also provided in a clean C++ class library. Demo apps are included to show how to use the class library.Some Quick Examples of using Waffles:The Waffles tools mostly work with .arff files. Let's say you have a database or a spreadsheet that can export to comma-separated-values, and you want to convert it to a .arff file.Command:waffles_transform import mydata.csv > mydata.arff------------------------You might want to see some quick stats about a dataset.Command:waffles_plot stats IRIS.arff------------------------Another quick way to look at a dataset is to look at a matrix of pair-wise plots and look for correlated attributes. (Obviously every attribute is correlated with itself, so we show histograms of the attributes along the diagonal.) Only a small part of the plot is actually shown here.Command:waffles_plot overview diabetes.arff------------------------Maybe you'll need to tweak your dataset. You can swap columns, fill in missing values, sort in a particular column, shuffle rows, and numerous other useful transformations.Command:waffles_transform swapcolumns mydata.arff 0 3waffles_transform replacemissingvalues mydata.arffwaffles_transform sortcolumn mydata.arff 2waffles_transform shuffle mydata.arff------------------------Let's do some basic machine learning stuff. We'll use 50x2 cross-validation to test the predictive accuracy of various models on the iris dataset. We'll use baseline, a decision tree, an ensemble of 30 decision trees, a 3-NN instance learner, a 5-NN instance learner, naive bayes, a perceptron, and a neural network with one hidden layer of 4 nodes. (Many other models are available, but are not demonstrated here.)Command:waffles_learn crossvalidate -reps 50 -folds 2 iris.arff baselinewaffles_learn crossvalidate -reps 50 -folds 2 iris.arff decisiontreewaffles_learn crossvalidate -reps 50 -folds 2 iris.arff bag 30 decisiontree endwaffles_learn crossvalidate -reps 50 -folds 2 iris.arff knn 3waffles_learn crossvalidate -reps 50 -folds 2 iris.arff knn 5waffles_learn crossvalidate -reps 50 -folds 2 iris.arff discretize naivebayeswaffles_learn crossvalidate -reps 50 -folds 2 iris.arff orthogonalize neuralnetwaffles_learn crossvalidate -reps 50 -folds 2 iris.arff orthogonalize neuralnet -addlayer 4------------------------In this example, we will train a neural network with two hidden layers (each with 4 nodes). We will wrap the neural network in an orthogonalization filter so it can handle nominal data as well as continuous data. We will save the trained model to a file (model.twt). Then, we'll load that model from the file and use it to evaluate labels for all the patterns in a test set.Command:waffles_learn train train.arff orthogonalize neuralnet -addlayer 4 -addlayer 4 > model.twtwaffles_learn evaluate model.twt test.arff > predictions.arff------------------------You can use Waffles to plot equations. This is a simple 2D plot of the logistic sigmoid function. (By default, an image named plot.png is generated. You can view it with your favorite image viewer.)Command:waffles_plot equation -range -6 0 6 1 "f1(x) = 1/(1+e^(-x))"------------------------Let's plot multiple equations together. Notice that I define a helper-function, g(x). Of course you can use common operations like: abs, acos, acosh, asin, asinh, atan, atanh, ceil, cos, cosh, erf, floor, gamma, lgamma, log, max, min, sin, sinh, sqrt, tan, and tanh. You can also overload those operations, define constants, etc.Command:waffles_plot equation -range -10 0 10 10 "f1(x)=log(x^2+1)+2;f2(x)= x^2/g(x)+2;g(m)=10*(cos(m)+pi);f3(x)=sqrt(49-x^2);f4(x)=abs(x)-1"------------------------Suppose you want to make a precision-Recall graph for an ensemble of 100 random decision trees with the diabetes database. Here's how you could do this. (The horizontal axis shows the recall, and the vertical axis shows the corresponding precision for each of the labels. Blue shows the precision when trying to identify the cases most likely to test negative for diabetes. Red shows the precision when trying to identify the cases most likely to test positive for diabetes. Apparently, Random Forest finds the former task to be easier.)Command:waffles_learn precisionrecall diabetes.arff bag 100 decisiontree -random end > pr.arffwaffles_plot scatter pr.arff -lines------------------------Let's generate 2000 points that lie on a SWISS roll manifold. Since 3D stuff can be hard to visualize sometimes, we'll plot it from several different points of view.Command:waffles_generate swissroll 2000 -cutoutstar -seed 0 > sr.arffwaffles_plot 3d sr.arff -Blast -pointradius 300------------------------Now, let's generate and plot a collection of 2000 points that lie on a self-intersecting ribbon manifold.Command:waffles_generate selfintersectingribbon 2000 -seed 1 > in.arffwaffles_plot 3d in.arff------------------------Next, we'll use Manifold Sculpting to learn that self-intersecting ribbon manifold (We'll use 12 neighbors, 2 target dims, an intelligent neighbor-finding algorithm, a shortcut-pruning algorithm, and a slow scaling rate).Command:waffles_transform manifoldsculpting in.arff 12 2 -smartneighbors -pruneshortcuts -scalerate 0.9995 -seed 0 > out.arffwaffles_plot scatter out.arff -spectrum -pointradius 5 -nohorizaxislabels -novertaxislabels------------------------We'll draw 1 million random values from a gamma distribution (alpha=9, beta=2) and then plot a histogram of those values. Other supported distributions include: beta, binomial, cauchy, chisquare, exponential, f, gamma, gaussian, geometric, logistic, lognormal, normal, poisson, softimpulse, spherical, student, uniform, weibull.Command:waffles_generate noise 1000000 -seed 0 -dist gamma 9 2 > gamma.arffwaffles_plot histogram gamma.arff------------------------Waffles supports many useful filters. For example, if you have some high-dimensional data, but your algorithm works better with low-dimensional data, filter it through "pca". If you have data with real-values, but your algorithm only supports discrete values, filter it through "discretize". If your algorithm only supports real values, but you have nominal data, filter it with "orthogonalize". If your data is not within the ideal range, filter with "normalize". These filters work in both directions, and you can specify whether they apply to features, labels, or both.Command:waffles_learn crossvalidate data.arff pca 7 knn 5waffles_learn crossvalidate data.arff discretize naivebayeswaffles_learn crossvalidate data.arff orthogonalize meanmarginswaffles_learn crossvalidate data.arff normalize -range 0 1 somealgorithm------------------------If you don't know which algorithm to use, but you've got cycles to burn, cross-validation-selecting ensembles are always powerful. For really strong results, you can even make a cv-select ensemble of bagging ensembles.Command:waffles_learn splittest -trainratio 0.3 cvselect knn 5 orthogonalize neuralnet decisiontree discretize naivebayes endwaffles_learn splittest -trainratio 0.2 bag 50 cvselect decisiontree meanmarginstree end end------------------------Some algorithms have no internal model. You cannot train such algorithms, but you can still measure their predictive accuracy.Command:waffles_learn transacc train.arff test.arff agglomerativetransducerwaffles_learn transacc train.arff test.arff graphcuttransducer 5waffles_learn transacc train.arff test.arff neighbortransducer 5------------------------Matrix operations are also supported. For example, let's compute C=ATB†.Command:waffles_transform transpose a.arff > a_trans.arffwaffles_transform pseudoinverse b.arff > b_inv.arffwaffles_transform multiply a_trans.arff b_inv.arff > c.arff...and lots more.The Waffles class library also has a lot of functionality that is not yet available through the command-line tools. Here is an incomplete list of some of the things it can do: * Agent Algorithms * Arff Tools * Bagging * Calibration * Chess * Clustering * Cross-Validation Selection * Data Augmentation * Data Mining Tools * Decision Trees * Demos * Evolutionary Optimizer * Fourier Transform * Gaussian Mixture Model * Graph Cut * GUI Tools * Hidden Markov Models * Hill Climbers * Hierarchical Region Adjacency Graphs * Image Processing Tools * kd-Tree * k-Means * k-NN Instance Learner * Linear Regression * Manifold Learning * Multivariate Polynomials * MCMC for belief networks * Naive Bayes * Neural Network * Particle Swarm * Plotting * Precision/Recall * Principle Component Analysis * Q-Learning * Ray Tracer * Self Organizing Map * Significance Testing * Socket Wrappers * Stemmer...and more (see the documentation) What's New in This Release: [ read full changelog ] · Added the Locally-Linear Embedding (LLE) to the transform tool and improved the Breadth First Unfolding manifold learning algorithm. · Added the Kabsch algorithm for aligning data. · Added singular value decomposition to the transform tool. · Improved api docs. · Further simplified the learning interface. · Repaired some regressions with serialization. · Added several unit tests.