Course content:
- Introduction to Big data technologies
- Big data analytics in Hadoop
- Big data analytics in Python programming language
- Real-time big data analytics
Big data infrastructure. Big data services. Big data as part of cloud infrastructure. Non-Relational databases. Parallel search and processing of data. MapReduce. Designing big data solution for companies. Big data and knowledge discovery in data.
Hadoop framework for big data. Hadoop tools. HBase databases. HiveQl queries. Using Pig tool for writing MapReduce program. Apache Impala analytical database. Using Ambari tool for monitoring Hadoop jobs execution. Mahout library for machine learning.
Basic usage of Python for data manipulation and data analysis. Basics of machine learning and usage of Python and Python packages in data analytics and machine learning. Usage of packages: NumPy, Pandas, Bokeh, Agate, SciPy.
Big data analytics in real time. Big data analytics and artificial intelligence. Streaming data. Apache Spark. Development and management of Spark applications. Machine learning in Apache Spark. Data visualization.
Course is free of charge for all participants. Theoretical and practical teaching will be organized in cabinet 304, at Faculty of Organizational Sciences. All the participants are obliged to take a project related to the areas within the course.
Each participant who attends the classes and successfully completes the final project receives a certificate at the end of the course. For students or future students of Faculty of Organizational Sciences, the part of pre-exam obligations in one of the Elab subjects (depends on level of study) will be considered as completed, based on received certificate:
- Internet marketing – undergraduate studies
- Big Data in E-business – master studies
- Internet marketing-selected chapters – particular postgraduate studies
- Big Data infrastructure and services-selected chapters – PhD studies