Features

Features
Certification:
  • TAK
Dedicated training:
Number of training hours:
  • 24
Producer:
Training language:
  • polski
Training level:
  • Średniozaawansowany
Type of training:
  • stacjonarnie; online

Description

Company Description

InfoShare Academy is a leading IT academy offering comprehensive educational programs in new technologies for companies. Since 2015, we have supported organizations in developing technology teams through dedicated courses in Machine Learning, DevOps, Data Engineering, Python, UX/UI Design, AWS, and Kubernetes. Our training is based on practical skills and real business cases. We collaborate with over 300 industry practitioners, ensuring that our programs are tailored to current market needs. We specialize in reskilling and upskilling employees. With us, you will build effective teams implementing new technologies that will accelerate innovation and strengthen your company's competitiveness in the market. Check out our training offerings designed for companies, created to enhance your employees' competencies in the IT field.

Training Description

Azure Databricks is a big data service based on the Apache Spark platform that enables the creation, learning, and exploration of data in the cloud. It is a data processing platform that provides scalability, performance, and ease of use. Azure Databricks allows teams to coordinate work and share code more easily.

Who the Training is For
  • For individuals who want to use data to optimize processes.
  • For those who want to better understand Apache Spark.
  • For individuals with basic knowledge of data analysis.
  • For programmers, Data Engineers, and Data Scientists.
Goals

 

Benefits
  • You will learn the fundamentals of the Azure Databricks platform.
  • You will learn data processing and preparation.
  • You will learn how to analyze data with Databricks SQL.
  • You will learn to use Apache Spark.
Training Program
  • What is the Databricks Lakehouse Platform

    • Describe what the Databricks Lakehouse Platform is

    • Explain the origin of the Lakehouse data management paradigm

    • Outline fundamental challenges related to managing and using data

    • Describe security features of the Databricks Lakehouse Platform

    • Give examples of organizations that have benefited from using the Databricks Lakehouse Platform

  • What is Databricks SQL

    • Summarize fundamental concepts for using Databricks SQL effectively

    • Identify tools and features in Databricks SQL for querying data and sharing insights

    • Explain how Databricks SQL supports data analysis workflows that allow users to extract and share business insights

  • What is Databricks Machine Learning

    • Describe the basic overview of Databricks Machine Learning

    • Identify how using Databricks Machine Learning benefits data science and machine learning teams

    • Summarize the fundamental components and functionalities of Databricks Machine Learning

    • Exemplify successful use cases of Databricks Machine Learning by real Databricks customers

  • What is Databricks Data Science and Data Engineering Workspace

    • Describe the basic overview of Databricks Data Science and Engineering Workspace

    • Identify assets provided by the workspace

    • Describe a simple development workflow that queries and aggregates data

  • Databricks Workspaces and Services

    • Databricks architecture and services

    • Data Science and Engineering Workspace

    • Create and manage interactive clusters

    • Notebook basics

    • Git versioning with Databricks Repos

    • Using Databricks Repos

    • Getting started with the Databricks Platform

  • Delta Lakehouse

    • What is Delta Lake

    • Managing Delta Tables

    • Manipulating tables with Delta Lake

    • Advanced Delta

  • Relational Entities on Databricks

    • Databases and Views

    • Views and CTEs

  • ETL with Spark SQL

    • Query files directly

    • Providing options

    • Creating Delta Tables

    • Writing to tables

    • Cleaning data

    • Advanced SQL transformations

    • UDF

  • Getting Started with Databricks SQL

    • Getting started with Databricks SQL

    • Navigating Databricks SQL

    • Unity Catalog on Databricks SQL

    • Schemas, tables and views on Databricks SQL

  • Basic SQL on Databricks SQL

    • Ingesting data for Databricks SQL

    • Ingesting data

    • Joins

    • Delta commands in Databricks SQL

  • Presenting Data Visually

    • Data visualization

    • Data visualizations on Databricks SQL

    • Dashboards on Databricks SQL

    • Notifying stakeholders

  • Apache Spark Programming – DataFrames

    • Databricks platform

    • Databricks ecosystem

    • Spark SQL

    • DataFrames

    • SparkSession

    • Reader and writer

    • Data sources

    • DataFrame and column

    • Column and expression

    • Transformations, actions and rows

  • Apache Spark Programming – Transformations

    • Aggregation

    • Aggregation functions

    • Datetimes

    • Dates and timestamps

    • Complex types

    • Additional functions

    • UDFs

    • UDFs vectorized functions

  • Apache Spark Programming – Spark Internals

    • Spark architecture

    • Spark cluster, Spark execution

    • Shuffling and caching

    • Query optimization

    • Partitioning

  • Apache Spark Programming – Structured Streaming

    • Apache Spark programming

    • Streaming concepts

Duration

24 h/3 days

Price Includes
  • Certificate of completion
  • Monthly access to training recordings (for online format)
  • Customization of the training program to client needs

Zamów szkolenie