Hello, I'm

Mohammed

|

Engineered ETL pipelines processing 122K+ records — distributed with PySpark, containerized in Docker, and delivered as Power BI dashboards.

Data Pipelines
Python
Docker
etl_pipeline.py
Python Apache Spark PostgreSQL Docker Apache Airflow Power BI Pandas SQL Server SSIS Azure Cloud

About Me

I'm a university student with a deep fascination for leveraging data to solve complex problems. Driven by the potential of Artificial Intelligence and Data Science to shape the future, I've been building hands-on experience with the tools and thinking that power modern data systems.

My expertise spans the full data lifecycle — from ingesting and cleaning messy real-world data, to building automated pipelines, to generating actionable insights through visualization. I believe the best insights come from pipelines you can trust, and I write code with that reliability in mind.

Currently progressing through the Data Science career track on DataCamp, and selected for the Digital Egypt Pioneers Initiative (DEPI) — a prestigious traineeship under the Ministry of Communications and Information Technology focused on Data Engineering, AI, and Big Data Processing.

0+ Technologies
0+ Projects Built
0K+ Records Processed

My Skills

The tools and technologies I use to build reliable data solutions.

Language Data / DB Viz / BI Infra / Cloud

Hover a node to see connections

My Education

Where I built my foundation — from research-driven STEM to hands-on computer science.

University Current

Egypt-Japan University of Science & Technology

B.Sc. Computer Science

"Learning by Doing" — a Japanese-inspired, research-oriented curriculum emphasizing practical application from day one.

Relevant Coursework
  • Data Structures & Algorithms
  • Database Systems
  • Web Development
  • Computer Architecture (MIPS/Assembly)
  • Advanced Mathematics
Python SQL Algorithms Web Dev Assembly Research
High School Graduate

Beni Suef STEM School

Science, Technology, Engineering & Mathematics

Part of a highly selective national network for gifted students. Curriculum built around solving Egypt's "Grand Challenges" through scientific research and engineering prototypes.

Key Outcomes
  • Rigorous foundation in the scientific method
  • Collaborative engineering & rapid prototyping
  • Project-Based Learning (PBL)
  • Research & technical writing
STEM Research Prototyping PBL Engineering

My Experience

2024 — Present

Digital Egypt Pioneers Initiative (DEPI)

Ministry of Communications & IT (MCIT)

Selected for a prestigious national traineeship focused on building expertise in core Data Engineering, AI, and Big Data Processing methodologies.

  • Advanced Python for Data Engineering & scalable software foundations
  • SQL, Database Management & Microsoft Azure Data Engineer concepts
  • Data Pipeline design, lifecycle management & Big Data processing
  • Prompt Engineering & AI for Data Engineers
  • Comprehensive Capstone project — a portfolio-ready, real-world deployment
Python SQL Azure Big Data AI
2024 — Present

Data Science Career Track

DataCamp

Actively building practical data science fluency through hands-on courses covering analysis, visualization, and machine learning fundamentals.

Pandas Seaborn Data Analysis Visualization

Featured Projects

Café Sales ETL Pipeline

A full Extract → Transform → Load pipeline that cleans dirty data, loads it into SQL Server, and generates visual reports.

PythonPandasSQL Server

Pipeline Architecture

Extract
Raw CSV ingestion
Transform
Clean, impute, validate
🗄️
Load
SQL Server via SQLAlchemy
📊
Visualize
Matplotlib & Seaborn reports

Key Techniques

  • Algorithmic back-calculation for data recovery
  • Forward-fill imputation for categorical data
  • Dual-path cleaning (Deletion vs. Repair)
  • Automated verification reporting

Impact Numbers

90%Data Recovery Rate
$55KRevenue Reclaimed
~7KRows Saved
PythonPandasNumPySQLAlchemyPyODBCMatplotlibSeabornSQL Server

Telecom CDR Data Warehouse

An SSIS ETL pipeline that processes telecom Call Detail Records from CSV flat files into a SQL Server data warehouse — with lookup transforms, IMEI parsing, and error handling.

SSISSQL ServerStar Schema

Pipeline Architecture

Batch Input
Foreach Loop over CSVs
Lookup
IMSI → subscriber_id
Transform
Derive TAC & SNR from IMEI
Load
fact_transaction + error log

Key Techniques

  • Foreach Loop Container for batch CSV processing
  • Dimension lookup joins (IMSI reference table)
  • Derived Column transforms (TAC/SNR from IMEI)
  • Error row redirection & auditing

Architecture Highlights

3Star Schema Tables
10+Batch Files Processed
100%Error Rows Captured
SSISSQL ServerT-SQLData WarehouseStar SchemaVisual Studio

Steam Games Market Analysis

An end-to-end Big Data pipeline and BI suite that ingests, refines, and visualizes 122,000+ Steam marketplace records using a containerized ELT architecture with Medallion pattern.

PySparkDockerPower BI

Medallion Architecture

Bronze
Raw CSV ingestion
Silver
Schema realignment & validation
Gold
Feature engineering & metrics
Power BI
3 interactive dashboards

Engineering Challenges Solved

  • 13-column structural dislocation fix via indexed realignment
  • Spark-to-PostgreSQL type impedance casting layer
  • Custom derived metrics (est. revenue, sentiment %, age rating)
  • Full Docker containerization (Spark + HDFS + PostgreSQL)

Scale & Output

122K+Records Processed
3Power BI Dashboards
6+Docker Containers
PySparkDockerPostgreSQLHDFSPower BIJupyterJDBCMedallion Architecture

What I Can Do

From raw data to actionable dashboards — here's how I can help.

ETL Pipelines

End-to-end Extract, Transform, Load pipelines that ingest messy data, clean and reshape it, and deliver it reliably to databases — fully automated and reproducible.

Python Pandas SQLAlchemy

Data Cleaning & Wrangling

Turn noisy, incomplete datasets into analysis-ready tables. Imputation, deduplication, format standardization, and validation — with full transparency on what changed and why.

Pandas NumPy Regex

Power BI Dashboards

Interactive, visually compelling dashboards that surface the metrics that matter. From data modeling to calculated measures to polished drill-through reports.

Power BI DAX Data Modeling

Task Automation

Automate the repetitive stuff — file processing, report generation, data syncs, web scraping. If it's boring and you do it every day, I can script it away.

Python Scripting Scheduling

Data Visualization

Publication-quality charts and visual stories that reveal patterns in your data. Custom color palettes, annotation, and multi-panel layouts for reports and presentations.

Matplotlib Seaborn Power BI

Database Design

Schema design, table relationships, and optimized queries for SQL Server and relational databases. Clean structure that makes your future queries fast and painless.

SQL SQL Server PyODBC

Let's Connect

I'm currently looking for internship or entry-level opportunities where I can contribute to a team, grow as a data professional, and help turn data into valuable insights.

Send a Message