Description
Language : English
*Required Skills:*
- 5-7 years or more work experience in Data Engineering
- Develop multiple Data Warehouses, Data marts having various business logic with Microsoft
SQL Server , AWS Redshift ,AWS RDS and PostgreSQL database. - Implement advanced concepts like OLTP and OLAP using Microsoft SQL Server, PostgreSQL,
AWS Redshift and AWS RDS. - Manage and configured SQL databases like PostgreSQL, Microsoft SQL Server and AWS
Redshift as an Administrator and Developer. - Implement Spark Parallelize across multiple spark nodes to help ingest and transform huge
datasets using PySpark. - Import data from AWS S3 into Spark RDD.
- Make use of Spark SQL to query data frames and Spark Session data.
- Implement ETL framework using Spark with Python and loaded standardized data into Hive
and HBase tables. - Develop various complex ETL flows including Data Extraction , Storing Data , Data
Warehousing and Dimensional & Data Modeling with SSIS and Python (Pandas , Airflow and
PySpark). - Make various migrations of databases from on-premises infrastructure to cloud based
infrastructure for Microsoft SQL Server, PostgreSQL and Mongo DB databases . - Expert Knowledge on Mongo DB NoSQL Data modelling , tuning , disaster recovery and
scaling. - Made use of PL/SQL scripts using shell scripts to clean , tune and load data into databases.
- Made use of modules like pg_loader , bcp utility and timescale DB parallel copy during
migrations to automate bulk data transfers. - Have made use of Shell Scripts to clean source files for consumption , scheduling of jobs and
transfer of files from source to destination. - Extract , Load and Transform using PySpark , Spark SQL on Spark Clusters hosted on AWS
and Databricks. - Optimization of existing Spark Clusters hosted on AWS and Databricks.
- Developed multiple data visualization reports using Tableau , Power BI and SSRS with filters
and charts. - Made use of advanced Power BI features like RLS with binning/grouping , outlier grouping to
create user specific views. - Make use of M query to enable custom data ingestion as per customer requirement during
dataset refresh of Power BI