
Thiago Baptista
Engineer. Builder. Problem Solver.
Senior Data & Analytics Engineer specializing in Data Lakehouse and Big Data solutions. Building end-to-end analytical solutions with Databricks, Apache Spark, Delta Lake, and Microsoft Fabric.
About Me
I’m Thiago Baptista, a Senior Data & Analytics Engineer with over two decades of experience turning raw data into strategic assets. My journey started with relational databases and BI in 2000 and evolved into modern Data Lakehouse architectures with Databricks, Apache Spark, and Microsoft Fabric. I design and build end-to-end analytical solutions: from ingestion pipelines to Gold-layer data products that power executive dashboards and drive business decisions.
Quick Facts
Current Role
Senior Analytics Engineer at Natura (via HVAR)
Education
Federal University of Espírito Santo (UFES), B.Sc. Computer Science
Certifications
- Databricks Certified Data Engineer Associate
- Microsoft Certified: Fabric Data Engineer Associate
- Microsoft Certified: Azure Data Fundamentals
- GitHub Foundations
Languages
Skills
End-to-end data engineering and analytics, from Data Lakehouse design to governed data products that drive business decisions.
Lakehouse Analytics
Data Lakehouse with Databricks and Microsoft Fabric. Medallion architecture, Delta Lake, dimensional modeling, Silver-to-Gold transformations with PySpark and Spark SQL, business metric definition, and data delivery for Tableau and Power BI.
Pipelines & CI/CD
Scalable data pipelines with Apache Spark and Delta Lake. CI/CD automation with Azure DevOps and GitHub Actions for deployment and versioning of data artifacts.
Governance & Quality
Unity Catalog and Microsoft Purview. Data cataloging, lineage tracking, access control, metric dictionaries. Data quality with Delta Live Tables expectations, unit testing, and data quality checks.
Leadership & Industry
Leading multidisciplinary teams, defining architecture standards, and translating business requirements into data solutions. Sectors: retail, industry, agribusiness, cosmetics, logistics, and government.
Experience
From relational databases to modern Data Lakehouse, 30 years building data solutions that drive strategic decisions.
Natura (via HVAR)
Senior Analytics Engineer
Latin America's largest cosmetics company, operating across multiple countries and brands
Key Achievements
- Design and build Gold-layer data products in Databricks feeding executive Tableau dashboards
- Silver-to-Gold transformations with PySpark and Spark SQL: OBT, Star Schema, SCD patterns
- Multi-brand, multi-country financial and commercial KPIs: revenue, pricing, approval rates, customer lifecycle
- Business metric definition and documentation aligned with Natura’s Data Hub standards and Unity Catalog
- DataViz enablement: query packages with field-level documentation for Tableau developers
Natura (via HVAR)
Senior Data Engineer
Data modernization and SAP-Databricks integration project
Key Achievements
- Technical lead for SAP data ingestion (ECC, BW) into Databricks via SAP BDC and AecorSoft connectors
- Bronze/silver/gold layer design and evolution based on Delta Lake
- Spark/PySpark pipeline optimization with advanced partitioning and data versioning
- CI/CD pipelines for notebook and workflow deployment using GitHub
- Technical standards: naming conventions, table versioning, Unity Catalog documentation
Prodesp (via AlmavivA)
Senior Data Engineer
GDAP - Digital Cabinet for Public Administration, serving the Governor of São Paulo State
Key Achievements
- Scalable data pipelines with Databricks and Spark for state-wide data integration
- Data Lakehouse architecture to unify and democratize access to government data
- CI/CD with Azure DevOps: fully automated pipelines for ingestion, transformation, and deployment
- Power BI semantic models and dashboards for strategic government consumption
Prodesp (via AlmavivA)
Senior Database Administrator
Supporting the São Paulo State Secretary of Economic Development
Key Achievements
- Advanced SQL Server: views, CTEs, stored procedures, T-SQL scripts for government programs
- Production environment support with proactive monitoring and SLA-based incident management
- Python automation scripts for data extraction, analysis, and integration
Elever Vision
Senior Data Engineer
Data engineering consulting across retail, industry, media, and entertainment
Key Achievements
- Data Lakes and Data Warehouses with Azure Storage, Azure SQL Database, SQL Server, PostgreSQL
- ETL pipeline orchestration with Azure Data Factory, Apache Airflow, and SSIS
- CI/CD pipelines with Azure DevOps and GitHub for data lifecycle governance
- Power BI dashboards and Python (pandas) automation for large-scale data processing
Vale / VLI
Data Engineer
BI and Data Warehousing for railway and port logistics operations
Key Achievements
- Technical lead for a 15-person team building Data Warehouse for FP&A processes
- Multidimensional modeling in SQL Server: fact/dimension tables, complex T-SQL
- ETL orchestration with SSIS integrating SAP, Oracle ERP, Cognos, and operational systems
- Financial analysis solutions for demand forecasting and cost allocation
Fundação Ceciliano Abel de Almeida
Data Engineer
Non-profit affiliated with the Federal University of Espírito Santo (UFES)
Key Achievements
- BI layer over ERP Sapiens using SQL Server for advanced analytics and management reporting
- Database administration: capacity planning, backup, disaster recovery, high availability
- ETL automation with Visual Basic and Crystal Reports dashboards
Tech Stack
Core technologies spanning data platforms, programming, orchestration, and visualization.
Big Data & Processing
Data Platforms
Programming & Query Languages
Data Governance
Databases
Visualization
Cloud & Storage
CI/CD & DevOps
Project Management
Let's Connect
Send me a message
Contact Information
Interested in discussing data architecture, lakehouse solutions, or collaboration opportunities? Let's connect.