Features
Description
InfoShare Academy is a leading IT academy offering comprehensive educational programs in new technologies for companies. Since 2015, we have been supporting organizations in developing technology teams through dedicated courses in Machine Learning, DevOps, Data Engineering, Python, UX/UI Design, AWS, and Kubernetes. Our training is based on practical skills and real business cases. We collaborate with over 300 industry practitioners, ensuring that our programs are tailored to current market needs. We specialize in reskilling and upskilling employees. With us, you will build effective teams implementing new technologies that will accelerate innovation and strengthen your company's competitiveness in the market. Check out our training offerings designed for companies, created to develop your employees' competencies in the IT field.
PySpark is a library for Apache Spark that allows you to create and run distributed tasks on clusters using Python. PySpark provides an API for working with data distributed by Spark, as well as access to all Spark features such as mapping, aggregation, filtering, and grouping data. PySpark is widely used in Big Data, data analysis, and machine learning.
- For programmers familiar with Python
- For those who want to learn one of the most popular data processing tools.
- For analysts with knowledge of Python.
- For Data Scientist specialists
- You will learn about the application of Big Data in organizations
- You will understand the basic issues related to working with data in Apache Spark
- You will learn Spark Project Core and Spark SQL
- You will discover how to use Spark ML in practical applications
Module 1 – Apache Spark Architecture
What is what in the organization
Place in the reality of "Big Data"
Module 2 – RDDs
Basic issues related to working with data in Apache Spark
Module 3 – Differences between Python syntax and PySpark
RDD vs Pandas DataFrame
Module 4 – Variables, partitioning, and other Spark Project Core issues
Module 5 – Spark SQL
Working with DataFrame
Syntax
Schemas
Aggregations
Module 6 – Spark ML
Module 7 – Prototyping
Module 8 – Running and managing tasks in the cluster
Module 9 – Testing processes
Module 10 – Optimization and configuration of tasks
Module 11 – Spark Structured Streaming
Module 12 – Q&A Session
32 hours / 4 days
- Certificate of completion
- Monthly access to the training recording (for online format)
- Customization of the training program to client needs