Features

Features
Certification:
  • TAK
Dedicated training:
Number of training hours:
  • 32
Producer:
Training language:
  • polski
Training level:
  • Średniozaawansowany
Type of training:
  • stacjonarnie; online

Description

Company Description

InfoShare Academy is a leading IT academy offering comprehensive educational programs in new technologies for companies. Since 2015, we have been supporting organizations in developing technology teams through dedicated courses in Machine Learning, DevOps, Data Engineering, Python, UX/UI Design, AWS, and Kubernetes. Our training is based on practical skills and real business cases. We collaborate with over 300 industry practitioners, ensuring that our programs are tailored to current market needs. We specialize in reskilling and upskilling employees. With us, you will build effective teams implementing new technologies that will accelerate innovation and strengthen your company's competitiveness in the market. Check out our training offerings designed for companies, created to develop your employees' competencies in the IT field.

Training Description

PySpark is a library for Apache Spark that allows you to create and run distributed tasks on clusters using Python. PySpark provides an API for working with data distributed by Spark, as well as access to all Spark features such as mapping, aggregation, filtering, and grouping data. PySpark is widely used in Big Data, data analysis, and machine learning.

Who the Training is For
  • For programmers familiar with Python
  • For those who want to learn one of the most popular data processing tools.
  • For analysts with knowledge of Python.
  • For Data Scientist specialists
Goals

 

Benefits
  • You will learn about the application of Big Data in organizations
  • You will understand the basic issues related to working with data in Apache Spark
  • You will learn Spark Project Core and Spark SQL
  • You will discover how to use Spark ML in practical applications
Training Program
  • Module 1 – Apache Spark Architecture

    • What is what in the organization

    • Place in the reality of "Big Data"

  • Module 2 – RDDs

    • Basic issues related to working with data in Apache Spark

  • Module 3 – Differences between Python syntax and PySpark

    • RDD vs Pandas DataFrame

  • Module 4 – Variables, partitioning, and other Spark Project Core issues

  • Module 5 – Spark SQL

    • Working with DataFrame

    • Syntax

    • Schemas

    • Aggregations

  • Module 6 – Spark ML

  • Module 7 – Prototyping

  • Module 8 – Running and managing tasks in the cluster

  • Module 9 – Testing processes

  • Module 10 – Optimization and configuration of tasks

  • Module 11 – Spark Structured Streaming

  • Module 12 – Q&A Session

Duration

32 hours / 4 days

Price Includes
  • Certificate of completion
  • Monthly access to the training recording (for online format)
  • Customization of the training program to client needs

Zamów szkolenie