ADRIANA ESTRADA

INTRODUCTION

ADRIANA ESTRADA

DATA ENGINEER

By : Adriana Estrada
Email : adr.estrada@gmail.com
Github :

My name is Adriana Estrada, with over +5 years of relevant experience as a Data Engineer (BI), I have established a solid foundation in designing, implementing, and maintaining robust data infrastructures. My expertise lies in ensuring the availability, reliability, and optimized performance of data systems. I am proficient in executing ETL processes, developing efficient data pipelines, and crafting intricate data models, with a keen focus on transforming raw data into actionable insights. My skill set extends across a broad range of data technologies, including SQL, NoSQL, Hadoop, Spark, and cloud platforms such as AWS and Azure. My ability to optimize data workflows and uphold data quality has consistently facilitated data-driven decision-making in various organizations.

Collaborative and adaptable, I excel in working alongside cross-functional teams, adept at navigating the everevolving data landscapes. I am deeply committed to advancing data architecture and spearheading innovation in the realm of data engineering, positioning me as a valuable asset in any data-centric role.

AREAS OF EXPERIENCE

Data Analysis and Machine Learning : Big data Analytics, Data Modeling, Machine learning.

Business-oriented data modeling and specifications - Business Analytics : Big data Analytics, Data Modeling, Machine learning.

Data Extraction, Manipulation, and Modeling ETL and Data Integration : ETL Process API, Automating Data Integration and Transformation, Building Data, Warehouse | API (applications).

Big Data Development and Technologies : Data Developer (Python, Scala, Java, SQL, Pig, Hive), Hadoop Architecture Apache Spark, Kafka, HDFS, Hive, Pig, NoSQL/ SQL, Apache Avro.

Data Quality and Governance : Data Quality Analyst with experience with Master Data Management (MDM).

Business Intelligence and Reporting : SSIS, SSAS, Data Analysis & Reporting (Tableau, Power BI).

Cloud Computing and AWS: AWS Cloud Computing (VPC, EC2, S3, Athena, RDS, EMR, DynamoDB, Route 53, CloudFront, Redshift, Glue), Building Websites and Web Apps on AWS services.

Machine Learning and Predictive Modeling : Machine Learning (models using regression, SparkML), Data Mining, and Data Modeling.

Database Management : Strong SQL Skills (SQL Server, Hive, Spark), Oracle | SSIS | SSAS | OLAP | OLTP, Strong In-memory database knowledge.

Web and Database Programming :Web and Database Programming (HTML, CSS, JavaScript, PHP).

TECHNICAL SKILLS

Here are some technologies I have been working with:

Programming & Development Technologies: Phyton, Scala, java (OOP), JavaScript, Bootstrap.

Cloud Computing and Cloud Platforms: Hadoop Cluster, Amazon (AWS), Azure Data Factory.

Data Processing Technologies and Framework: Apache (Spark, SparkSQL, Spark Streaming, Kafka, Hadoop, HDFS, Hive, Zookeeper, YARN/MapReduce/Pig), SparkML, Avro, JDBC, REST, Parquet, API, JASON.

Database Technologies and Data Management: SQL server management, RDBMS, MySQL, Oracle 11g.

Cloud Data Services and Data Warehousing: Snowflake, Azure Data Factory, AWS (S3, RedShift, Glue, EMR, Athena, RDS, Lambda, VPC, EC2, Kinesis).

Web Technologies and Front-End Design: HTLM5, CSS3, XML, Photoshop.

Business Intelligence and Data Visualization: Tableau, Power BI, VIZQL.

Representation & Modeling Tools: Visio, Draw.io.

Version Control and Development Collaboration CI/CD: Git, DevOps, Agile / Scrum.

Containers and Virtualization: Docker

Data Governance and Data Quality Management: INFORMATICA Intelligent Cloud Services (IICS), Markit EDM.

Security Information and Event Management, Real-Time Monitoring-Alerting: SPLUNK

EXPERIENCE

DATA ENGINEER

AGNICO EAGLE
Montreal, Canada

Worked closely with Data Analysts, Solutions Engineers, and mining domain experts to understand their data needs and deliver suitable solutions designing optimized data models from various data sources to meet business needs.

Designed, created, and manage datasets and databases to meet the data needs of the organization

Azure Data Factory for data orchestration, design and implement scalable data pipelines using ADF and Databricks. Develop and maintain robust ETL (Extract, Transform, Load) processes to ensure clean and accurate data for analysis.

Created Delta tables in a medallion architecture for efficient data storage.

Experience with MS SQL Server Integration Services (SSIS), T-SQL skills, stored procedures, triggers.

Debugging the failure cases on Prod Environment to get the system up and running.

Collaborate with data analysts and business stakeholders to translate data needs into actionable insights through Power BI reports and dashboards.

Troubleshooting the SIS Packages in On-premises environments and Inclusion of data Manipulation Stored Procedures on SQL Server

Developed, test, and maintain SQL queries, and stored procedures built and with established standards.

Refactor SQL server databases to Azure Warehouse (mapping, desing, API, data base design, power BI for building compelling reports and dashboards (DAX a plus)).

Working on Azure suite: Azure SQL Databases, Azure Data Factory (ADF) V2, Azure SQL Data Warehouse, Azure Blob Storage.

Debugging the failure cases on Prod Environment to get the system Up and Running

Experience building Power BI dashboards, mapping views and refactorized models from SQL server on Databricks end to end.

Worked written statistics script on databricks, (Matplot).

Applying data cleaning techniques using Python and PySQL to ensure high data quality and consistency, build data bases.

Cloud Data Migration and Integration.

ANALYST, DATA QUALITY MANAGEMENT

PSP INVESTMENT
Montreal, Canada

Serving as an intermediary between the PSP client and the IT team to ensure a clear understanding of business needs and alignment of solutions with expectations.

Control's Implementation.

Business-oriented data modeling and specifications, Business Analytics.

Defining data quality and validation rules to maintain high internal data quality across multiple asset classes.

Entering the business needs into the tracking tool and ensuring that the priority of the application is well understood and integrated across all applications.

Evaluating the performance and design of the system to assess its impact on data quality.

Exploring and curating large amounts of data, leveraging BI tools to accelerate data-driven decision-making and transform group operations and strategy.

Explore and curate large amounts of data across multiple asset classes to maintain a high level of internal data quality by creating controls.

Development of controls and approaches to identify data quality anomalies and strategies for their correction.

Create documentation and artifacts for the data quality, alerting, and monitoring rules engine.

Informatics Development ETL IDQ. Responsible for implementing data quality processes, including design and development of complex mappings,
transformations, and procedures (e.g., lookup, source qualifier, update strategy, aggregator, router, sequence generator, range, stored procedure, filter, assembler, and classifier transformations, etc.).

Deploy mappings that will run in a scheduled, batch, or real-time environment.

Data quality and data governance

Utilizing Power BI and Tableau for data visualization and analysis.

Developing calculations, parameters, dashboard actions, and other components to enhance data visualization.

Define measurement indicators and control systems based on data that facilitate the maintenance of initiatives, decision-making, and action.

DATA DEVELOPER AND VISUALIZATION ANALYST

MCIT
Montreal, Canada

Provided technical documentation and UML diagrams capturing functional requirements and design.

Worked with the Hadoop ecosystem and tool sets, MapReduce, Pig, Spark, HDFS, Hive, and HBase (including configuring Hadoop API)

Designed and implemented a real-time data pipeline to process semi-structured data by integrating raw records from different data sources using Kafka, Spark streaming, Scala, Hive, and S3.

Analytic experience RDBMS.

Developed Hadoop integrations (batch or streaming) for data ingestion, data mapping, and data processing capabilities.

Utilized Spark and Scala to distribute data processing on large streaming datasets to improve ingestion and processing speed on that data by 67% into the pipeline in near-real-time.

Experience automating data pipelines in a Data ecosystem.

Proficient programming in both compiled languages (Scala, Java) and scripting languages (Python).

Developed, designed, built, tested, optimized, and maintained a Data warehouse using multiple tools.

Experience in Big Data performance analysis, tuning, and capacity planning.

Design a full batch data pipeline, and ETL Job using Spark.

Developed Tableau data visualization dashboards as solutions in their services.

Work optimizing real-time low latency auction bidding application on a real-time advertising platform.

design a full batch data pipeline, I used Hive, I get the information every day and run an ETL pipeline to enrich data for reporting and analysis purposes in real-time.

Designed a batch ETL job using S3 and Athena, and designed a cloud-native full batch data pipeline.

Converted JSON files to CSV and wrote them in a directory on the HDFS, using Circe-JSON library to parse JSON documents into CSV records.

Experience Integrate and deploy a server with Amazon AWS. Design a cloud-native full batch data pipeline.

Implementation of the RESTful API (application) using schemas Avro and Parquet.

Familiar with building container (Docker).

Familiar with ML.

ANALYST, OPERATIONAL RELIABILITY

ECOPETROL
Santander, Colombia

Predictive and preventative maintenance support, root cause investigation participation, KPI tracking, reporting, and implementation of continuous improvement strategies using accepted best practices.

Leader in the implementation of the Operational Integrity program in the Superintendency of Operations The ECOPETROL.

Guide and recommend so that the work is carried out following the methodology and standards (best practice).

Solve the doubts that arise during the development of the work.

Provide the supporting documentation required to meet the objective.

Monitoring of the operational status of the equipment to carry out management and ensure availability.

Align all activities ensuring the integrity of the program.

Define the procedure to develop structured operational rounds.

Carry out reports on the risks identified in the equipment in the different operating processes.

Report the deviations found in the equipment and/or systems and carry out the Maintenance - Operations management for the closure.

Train staff in the modules of the Operational Integrity Program.

Support monitoring elements of Process Safety Management in SCI.

Support for the Process Safety Administration program.

EDUCATION

BIG DATA DEVELOPER, BUSINESS INTELLIGENCE AND VISUALIZATION ANALYST
MONTREAL COLLEGE OF INFORMATION TECHNOLOGY.

CHEMICAL ENGINEERING
INDUSTRIAL UNIVERSITY OF SANTANDER.

CERTIFICATIONS

SOFWARE ENGINEERING BASIC FOR EVERYONE CERTIFICATE
IBM.
Supervised Machine Learning: Regression and Classification
Coursera Stanford University.
Advanced Learning Algorithms
Coursera Stanford University.
Getting Started with Amazon EMR
AWS Training and Certifications.
Getting Started with AWS Glue
AWS Training and Certifications.

ARTIFICIAL INTELLIGENCE (AI) TECHNIQUES: FROM FOUNDATIONS TO APPLICATIONS
UNIVERSITÉ DE MONTREAL.
Introduction to EC2 Auto Scaling
AWS Training and Certifications.
Amazon Redshift Service Primer
AWS Training and Certifications.
Introduction to Amazon Elastic MapReduce (EMR)
AWS Training and Certifications.
Amazon Redshift Service Primer
AWS Training and Certifications.

PERSONAL PROJECTS

"Plans and strategies that improve your live"

Outside of work, I'm interested in following the developments of science. I also play with robots structures, create content which I publish in git repositories, using the services of Amazon. I host my personal website (cloudfront, Route 53) and youtube.

Here are some technologies I have been working with.

Personal webside : Arista-va
Youtube : click on
Github portfolio : click on
Data warehouse : click on
ERD : click on
Batch ETL job using Spark : click on
Kafka : click on
Hadoop : click on
Spark : click on
ETL job : click on
Stream data into pipeline in near real time using Spark : click on
Glue ETL job : click on
Automate task with Lambda : click on
Control predictivo de modelos-MPC : click on

CONTACTS

Say Hi!.

Email me at : adr.estrada@gmail.com

linkedin : click on