Saturday 4 March 2017

Machine Learning! Where To Start? - Tools

Popular Tools in 5 points

Octave

  1. This is an Open Source project.
  2. Easy to write complex Machine Learning equations.
  3. Vectorization helps with easy manipulations for matrices operations.
  4. Python like CLI
  5. Limited to small data sets (cannot handle BIGDATA). 

 R

  1. This is an Open Source project.
  2. R can create graphics to be displayed on the screen or saved to file. It can also prepare models that can be queried and updated.
  3. R is a tool to use when you need to analyze data, plot data or build a statistical model for data.
  4. Build with an idea of statistic centric design for computation.
  5. This Blog covers almost every thing: http://machinelearningmastery.com/what-is-r/ 

Python 

  1. NumPy and Pandas are two most recommended libraries to get you started with data manipulation and some statistical calculations.
  2. IPython notebook is also gaining popularity now a days. 
  3. Sci-kit learn and Tensor-Flow are machine learning libraries that are available which makes model building simpler for everyone.
  4. Python is more popular choice amongst the programmers.

Apache Mahout 

  1. Open source Scalable Machine learning platform.
  2. Runs multiple map-reduce jobs to run a machine learning algorithm.
  3. Its build over top of Hadoop.
  4. Its batch processing.

Apache Spark

  1. Open source Big Data platform. 
  2. MLlib is the machine learning library available here.
  3. It is in-memory processing, that's why faster than Mahout.
  4. It supports micro-batch processing.

Summary

Hopefully, it will help you to make right choice. I would recommend if you would like to understand the mathematics of machine learning then use octave or python to implement machine learning algorithms without any library. If your objective is to only apply these algorithms then you may start with python and scikit learn, moreover there are plenty tutorials over the Internet that can get you started. Other Popular libraries are Apache Apex-SAMOA, H2O, Flink, Weka, Java-ML etc. Please comment if you would like to have some other comparisons.