Features
Description
InfoShare Academy is a leading IT academy offering comprehensive educational programs in new technologies for companies. Since 2015, we have supported organizations in developing technology teams through dedicated courses in Machine Learning, DevOps, Data Engineering, Python, UX/UI Design, AWS, and Kubernetes. Our training is based on practical skills and real business cases. We collaborate with over 300 industry practitioners, ensuring that our programs are tailored to current market needs. We specialize in reskilling and upskilling employees. With us, you will build effective teams implementing new technologies that will accelerate innovation and strengthen your company's competitiveness in the market. Check out our training offerings designed for companies, created to develop your employees' competencies in the IT field.
- The Apache Spark training is an intensive two-day course focused on the practical application of this popular framework for processing large datasets. The training program is designed so that 80% of the time is dedicated to practical workshops and 20% to theory. Participants will gain solid theoretical foundations and practical skills in using Apache Spark, working with real data and solving practical problems.
- Required technical skills:
- Basic programming knowledge in Python or Scala
- Basic knowledge of data processing
- Ability to work in a Unix/Linux environment
- Programmers and data engineers who want to expand their skills with Apache Spark
- Data scientists and data analysts wishing to process large datasets efficiently
- IT specialists and big data professionals who want to use Apache Spark in their projects
- You will learn:
- How to install and configure Apache Spark in various environments
- How to process and analyze data using RDD, DataFrame, and Spark SQL
- How to optimize queries and manage resources in Apache Spark
- How to deploy Apache Spark applications in a production environment
Day 1: Introduction to Apache Spark and basics of data processing
Introduction to Apache Spark
History and development of Apache Spark
Architecture and main components (RDD, DataFrame, Spark SQL)
Installation and configuration of the environment
Installation of Apache Spark and dependencies
Configuration of the working environment (Standalone, Hadoop, AWS)
Basics of data processing in Apache Spark
Working with files: JSON, CSV, XML, TXT, Parquet, AVRO
Transformation and Action – the principle of Lazy Evaluation
Day 2: Advanced techniques and practical applications
Advanced data processing using DataFrame and Spark SQL
Creating and managing DataFrame
Using Spark SQL for queries on large datasets
Data transformation
Sorting, grouping, and filtering data
Data transformations using map, flatMap, UDF functions
Window functions (analytical)
Workshop: Processing and analyzing data using DataFrame
Implementing operations on DataFrame and SQL queries
Analyzing large datasets using Spark SQL
Optimization and performance tuning
Query optimization techniques and Spark operations
Memory management and resource allocation
Partitioning and data writing
Deploying Apache Spark applications
Preparing and exporting Spark applications
Deploying applications in a production environment
16 h/2 days
- Certificate of completion
- Monthly access to the training recording (in case of online format)
- Customization of the training program to client needs