Features

Features
Certification:
  • TAK
Dedicated training:
Number of training hours:
  • 16
Producer:
Training language:
  • polski
Training level:
  • Podstawowy
Type of training:
  • stacjonarnie; online; warsztat

Description

Company Description

InfoShare Academy is a leading IT academy offering comprehensive educational programs in new technologies for companies. Since 2015, we have supported organizations in developing technology teams through dedicated courses in Machine Learning, DevOps, Data Engineering, Python, UX/UI Design, AWS, and Kubernetes. Our training is based on practical skills and real business cases. We collaborate with over 300 industry practitioners, ensuring that our programs are tailored to current market needs. We specialize in reskilling and upskilling employees. With us, you will build effective teams implementing new technologies that will accelerate innovation and strengthen your company's competitiveness in the market. Check out our training offerings designed for companies, created to develop your employees' competencies in the IT field.

Training Description
  • The Apache Spark training is an intensive two-day course focused on the practical application of this popular framework for processing large datasets. The training program is designed so that 80% of the time is dedicated to practical workshops and 20% to theory. Participants will gain solid theoretical foundations and practical skills in using Apache Spark, working with real data and solving practical problems.
  • Required technical skills:
  • Basic programming knowledge in Python or Scala
  • Basic knowledge of data processing
  • Ability to work in a Unix/Linux environment
Who the training is for
  • Programmers and data engineers who want to expand their skills with Apache Spark
  • Data scientists and data analysts wishing to process large datasets efficiently
  • IT specialists and big data professionals who want to use Apache Spark in their projects
Goals

 

Benefits
  • You will learn:
  • How to install and configure Apache Spark in various environments
  • How to process and analyze data using RDD, DataFrame, and Spark SQL
  • How to optimize queries and manage resources in Apache Spark
  • How to deploy Apache Spark applications in a production environment
Training Program

Day 1: Introduction to Apache Spark and basics of data processing

  1. Introduction to Apache Spark

  • History and development of Apache Spark

  • Architecture and main components (RDD, DataFrame, Spark SQL)

  1. Installation and configuration of the environment

  • Installation of Apache Spark and dependencies

  • Configuration of the working environment (Standalone, Hadoop, AWS)

  1. Basics of data processing in Apache Spark

  • Working with files: JSON, CSV, XML, TXT, Parquet, AVRO

  • Transformation and Action – the principle of Lazy Evaluation

Day 2: Advanced techniques and practical applications

  1. Advanced data processing using DataFrame and Spark SQL

  • Creating and managing DataFrame

  • Using Spark SQL for queries on large datasets

  1. Data transformation

  • Sorting, grouping, and filtering data

  • Data transformations using map, flatMap, UDF functions

  • Window functions (analytical)

  1. Workshop: Processing and analyzing data using DataFrame

  • Implementing operations on DataFrame and SQL queries

  • Analyzing large datasets using Spark SQL

  1. Optimization and performance tuning

  • Query optimization techniques and Spark operations

  • Memory management and resource allocation

  • Partitioning and data writing

  1. Deploying Apache Spark applications

  • Preparing and exporting Spark applications

  • Deploying applications in a production environment

Duration

16 h/2 days

Price includes
  • Certificate of completion
  • Monthly access to the training recording (in case of online format)
  • Customization of the training program to client needs

Zamów szkolenie