Cognizant Technology Solutions
Nov – 2022 to Present
• Worked on a data migration project for a U.S.-based pharmaceutical client, migrating the existing system to Apache Iceberg using PySpark.
• Developed custom code for ingesting raw data from multiple sources: Oracle (via JDBC), Salesforce APIs, and file-based formats (CSV, text, XML), with data stored in AWS S3.
• Executed data transformations across raw, TL, and ATL layers, creating cross-references (XREFs) and harmonized golden records for critical tables.
• Developed an optimized solution for writing data to Salesforce schemas using APIs for downstream applications from Apache Iceberg tables using PySpark.
• Analyzed the existing framework and complex SQL codes, reimplementing them using dynamic, generic scripts and queries to enhance efficiency.
• Improved data quality and ensured seamless data orchestration using the Modak Nabu tool.
• Collaborated with the team to lead critical deployments and resolve challenges in production environments.
• Applied advanced data engineering concepts to refine workflows, boosting pipeline efficiency and enhancing data quality, leading to a 25% reduction in errors.
• Optimized data pipelines to enhance performance and ensure seamless data availability for outbound systems, improving accuracy and reducing processing time by 30%.
Cognizant Technology Solutions
Feb – 2022 to July – 2022
• Domain for internship training – Big Data and PySpark
• Trained on Hadoop, Hadoop YARN, Pig, Hive, HBase, Apache Spark, Apache Kafka, Apache Flume, Apache Sqoop, Apache NiFi, Data Warehouse Fundamentals, Zookeeper, Scala and PySpark and ETL
• Performed different operations on various datasets using Scala and PySpark
• Studied and implemented Apache Spark modules like Spark SQL, Spark Streaming, MLLib, etc.
• Used Pig and Hive on datasets for various operations