In this talk, Óscar Martínez will review which are the usual workloads that we find in Cloudera clusters: data lakes, data warehouses, operational databases, search engines, data engineering, real time & streaming and, last but not least, data science. For each type of workload, he will share lessons learned from recent implementations. He will review how to size a large cluster for multi-workload functioning, how to optimally organize data lakes and data warehouses to ease data engineering operations and to have best analytical query performance – he will show how to estimate query response times when we are designing and sizing a cluster, this is usually a lever when sizing that is not considered (usually only capacity is considered), but it may eventually be critical to avoid incurring unexpected expenses (because the cluster needs to be larger than originally planned to meet query performances SLAs).
Óscar will also discuss how to leverage search engines in Cloudera and how their combination with other services in Cloudera offers key advantages. He will also review best practices when developing data engineering, real-time & streaming applications on Cloudera considering the recent additions in the stack, such as Nifi. Finally, he will review the available options in Cloudera to develop ML & AI solutions.
#BIGTH19 #AI #Analytics #BigData #Cloud
Session presented at Big Things Conference 2019 by Óscar Martínez Rubi, Principal Business Intelligence, Big Data Consultant & Project Manager at ClearPeaks.
20th November 2019
Do you want to know more? https://www.bigthingsconference.com/