Sanjit Kumar

I am a software engineer at Amazon Music. I recently graduated with a master's degree in computer science from the University of Illinois Urbana-Champaign. I am broadly interested in full stack software development, distributed and networked systems.

Outside of work, I read sci-fi/fantasy books, bike and cook.

Email  /  Resume  /  LinkedIn  /  Github /  Google Scholar

profile-photo
Experience
Software Development Engineer @ Amazon
Sep 2024 - present | Seattle, WA

Amazon Music: Growth and Marketing - Tier progressions.



Graduate Teaching Assistant: Distributed Systems @ Dept. of Computer Science, UIUC
Jan 2024 - May 2024 | Champaign, IL

Worked with Prof. Radhika Mittal on the course CS425: Distributed Systems - asynchronous clocks, global system snapshots, failure detection, concurrency control, consensus via paxos/raft, leader election, distributed transactions, DHTs and P2P systems etc, distributed datastores etc.

Mentored students to build distributed & networked application projects on reliable multicasting, distributed transactions, raft, distributed file system.

Fall Software Engineer Intern @ Aviz Networks Inc.
Aug 2023 - Nov 2023 | San Jose, CA

Developed and integrated a web UI, REST API, and Redis DB Cache for a C-based Network Packet Collector. Optimized and performance-tuned the packet collector pipeline via payload batching. Achieved a 3x improvement in throughput and 50% reduction in latency. Automated the configuration, build and execution of the packet collector with Bash scripts and Python.

Summer Software Engineer Intern @ Aviz Networks Inc.
May 2023 - Aug 2023 | San Jose, CA

Designed and Developed a new core feature for the company's primary product ONE-DL that simplifies moving network data from on-premise environments to cloud via Kafka.

Built a network packet collector to tap packets from high throughput network traffic from data centers to extract and stream metadata information. Leveraged scalable event processing systems like Kafka and Elasticsearch for data pipelining and downstream analytics. Benchmarked performance on physical network devices with software-based (Scapy) and hardware-based (Ixia) load generator for scalability testing. Used a Kafka consumer to integrate a REST API with the system for selective packet capture and data sink integrations.

Software Developer Intern @ Zigma Software
Sep 2021 - Nov 2021 | Erode, India

Built a weigh-bridge management MERN stack web application for a 'trucks and heavy motor vehicles' weighing company. Programmed a dashboard for visualization of revenue generation metrics and constructed unit tests. Coordinated meetings with stakeholders for design and performance feedback and improvement.

Full Stack Developer Intern @ WebKnot Technologies
Nov 2020 - Dec 2020 | Bangalore, India

Developed UI with ReactJS and wrote REST API endpoints with node.js and Express.js for two different MERN stack applications. Integrated custom Tensorflow object detection models with Shinobi (open-source CCTV framework).

clean-usnob Software Developer Lead @ Journalistic Literature Club, VIT
Feb 2020 - Jan 2021 | Vellore, India

Led with a team of 6 to move the literature club's newsletter segment (The Weekly Edge) online. Developed and launched a full scale MERN stack web app that facilitated writing, editing and publishing articles by club members. Tracked and increased reader traffic by 40%.

clean-usnob Academic Support Volunteer @ Make A Difference
Nov 2020 - Nov 2021 | Vellore, India

Tutored 10th grade English for students in Kasam Shelter attending King's School, Vellore. Planned and designed a tutoring schedule and tests over the year to prepare students for their final Public Exam.

Some Projects
IDunno: A Distributed ML Inference Task Scheduler System

Designed a distributed job scheduler system for ML inference tasks built on top of 10 Linux VMs from scratch using Java and Python. Uses a real-time work scheduling algorithm to optimize query rate for ResNet and ImageNet classification tasks. Includes a distributed data logging service, distributed group membership protocol and failure detector, a distributed files system.

WildSprint: A Fundraising platform for Wildlife conservation via ETH by Team Ambur Biryani
Best Ethereum Powered Project @ DevSpace '21 by CSI, VIT &
Best Dyte Powered Project @ DevSpace '21 by CSI, VIT


A fundraising platform for wild life sanctuaries that allows for donations via ETH. There are live-streams from high activity sites at the sanctuaries that allow audience to tune in, watch and donate for a cause. Demo youtube.

KYYarn: A data visualization app for inquiry and product inventory management of a textile yarn trading company

MERN Stack business intelligence web application that works as a website/catalogue for a yarn trading entity in Erode, TN, India to receive business inquiries for products. Data collected is used to create visualizations for business intelligence. Also designed and implemented an alternative JavaFX frontend.

DockerMapReduce: A distributed MapReduce simulation with docker nodes

A MapReduce algorithm for finding frequency of unique words a large volume of text tested on a single machine with multiple docker containers acting as separate nodes for map and reduce jobs. The throughput of the task is checked by scaling the data and nodes up and down.

Interactive Computer Graphics: Rasterizer, Ray Tracer, WebGL Terrain Modeling, Flight Simulation and Particle Effects

A collection of projects around interactive computer graphics - Implementations of a Rasterizer and Ray Tracer from scratch that take descriptions of a scene and output graphic images. Also includes a collection of work with WebGL: creating a 3D Model of an arbitrary landscape terrain using random fault generation, illumination via diffuse/specular lighting models and erosion simulation. A flight simulation of said generated terrain. Finally a simulation of particles with physical forces like momentum, gravity and drag. For more details and demonstrations of this work.

RealWorldBugDiscovery : An Evaluation of Open Source Real World Bugs via Popular Bug Datasets

In the this project, a comprehensive benchmarking study was conducted on the complexity of real-world bugs in open-source Maven and Gradle projects. Popular bug datasets such as Defects4J, BugSwarm, Bears, and QuixBugs were utilized to evaluate the quality of test suites generated by automated test generation tools, specifically Randoop and Evosuite. Code statement coverage for these bugs was analyzed using tools like Clover, IntelliJ IDEA, and JaCoCo. Suspiciousness scores for code statements were calculated from the coverage reports to identify lines of code with the highest likelihood of containing bugs. Results youtube.

Data Systems and Machine Learning: Online Aggregation, HyperLogLog, Apache Arrow-Parquet ASCII Encoder, Flatbuffers and Shared Memory

OLA: Implemented OLA to progressively obtain approximate query results for various Pandas DataFrame operations, such as filtered mean, grouped mean, and grouped sum.

Apache Arrow-Parquet Encoder: Implemented HyperLogLog to estimate the dataset cardinality. Modified the Apache Arrow source code to implement a new ASCII Encoder and Decoder for Parquet, supporting the Integer and Float data types.

Flatbuffers & Shared Memory: Worked with Google's Flatbuffers and Python's shared memory libraries to pass serialized DataFrames between notebook sessions, and performed various operations directly on the serialized DataFrames. This improves data sharing and manipulation efficiency in Python since leveraging Flatbuffers provides faster serialization and shared memory for inter-process communication.

Linux Kernel Development: Kernel Page Fault Profiler, Rate Monotonic Scheduler, Computing Kernel CPU Time

Page Fault Profiler: A kernel-space virtual page fault profiler, monitoring page faults and CPU run times via a virtual memory buffer and delayed work queue at 20 samples per second, and managed process registration through the proc interface.

Rate Monotonic Scheduler: A real-time CPU scheduler as a Linux kernel module based on Rate-Monotonic Scheduling (RMS) for single-core processors, utilizing the Proc filesystem for user space communication.

Computing CPU Time: A Linux kernel module for process management using a doubly linked list and proc filesystem, featuring periodic CPU time updates via a workqueue and timer callbacks.

Publications
A Survey on Internet of Things Security : Attacks, Solutions, Strengths and Limitations
International Conference on Artificial Intelligence and Machine Vision (AIMV), 2021

Presented a comparative analysis of benchmarks for latest security frameworks from then recent IoT literature while advised by Dr. Anil Kumar Kakelli. Categorically classified and critiqued existing IoT security frameworks based on their approaches to address the threat of malignant nodes in heterogenous device networks and general strengths/limitations.


design and source code from Jon Barron's website