Hyperight

Declarative Machine Learning With Apache SystemML – Shirish Tatikonda

Bookmark
Bookmark

The rising need for custom machine learning algorithms that scale to large data sizes on distributed runtime platforms pose significant productivity challenges to data scientists. Towards this end, I will present the philosophy and the architecture behind Apache SystemML. It enables declarative machine learning by letting data scientists specify their 00analytics using higher-level linear algebraic and mathematical constructs. These analytics scripts are transparently compiled into efficient execution plans running on modern distributed data-parallel frameworks, such as Apache Spark, Apache Hadoop, and Apache Flink.

Questions:

  • Domain-specific custom analytics: through a ML language vs. a library of pre-canned implementations
  • High-productivity for data scientists: through specification of what needs to be done instead of how it needs to be done
  • Write-once- run-everywhere: with support for single-node, in-memory, and distributed computation platforms
  • Automatic optimization and parallelization: through cost-based execution plans based on data, computation, and system characteristics

Add comment