This book explores and motivates the need for building homogeneous and heterogeneous multi-core systems for machine learning to enable flexibility and energy-efficiency. Coverage focuses on a key aspect of the challenges of (extreme-)edge-computing, i.e., design of energy-efficient and flexible hardware architectures, and hardware-software co-optimization strategies to enable early design space exploration of hardware architectures. The authors investigate possible design solutions for building single-core specialized hardware accelerators for machine learning and motivates the need for building homogeneous and heterogeneous multi-core systems to enable flexibility and energy-efficiency. The advantages of scaling to heterogeneous multi-core systems are shown through the implementation of multiple test chips and architectural optimizations.