Sanjit Kumar
I am a software engineer at Amazon Music. I recently graduated with a master's degree in computer science from the University of
Illinois Urbana-Champaign. I am broadly interested in full stack software development,
distributed and networked systems.
Outside of work, I read sci-fi/fantasy books, bike and cook.
Email  / 
Resume  / 
LinkedIn  / 
Github / 
Google
Scholar
|
|
|
Software Development Engineer @
Amazon
Sep 2024 - present | Seattle, WA
Amazon Music: Growth and Marketing - Tier progressions.
|
|
Graduate Teaching Assistant: Distributed Systems @
Dept. of Computer Science, UIUC
Jan 2024 - May 2024 | Champaign, IL
Worked with Prof. Radhika Mittal on the course CS425: Distributed Systems -
asynchronous clocks, global system snapshots, failure detection, concurrency
control, consensus via paxos/raft, leader election, distributed transactions, DHTs
and P2P systems etc, distributed datastores etc.
Mentored students to build
distributed &
networked application projects on reliable multicasting, distributed
transactions, raft, distributed file system.
|
|
Fall Software Engineer Intern @
Aviz Networks Inc.
Aug 2023 - Nov 2023 | San Jose, CA
Developed and integrated a web UI, REST API, and Redis DB Cache for a C-based Network Packet
Collector. Optimized and performance-tuned the packet collector pipeline via payload batching.
Achieved a 3x improvement in throughput and 50% reduction in latency. Automated the configuration,
build and execution of the packet collector
with Bash scripts and Python.
|
|
Summer Software Engineer Intern @
Aviz Networks Inc.
May 2023 - Aug 2023 | San Jose, CA
Designed and Developed a new core feature for the company's primary product ONE-DL that simplifies
moving network data from on-premise environments to cloud via Kafka.
Built a network packet
collector to tap packets from
high throughput network traffic from data centers to extract and stream metadata information.
Leveraged scalable event processing systems like Kafka and Elasticsearch for data pipelining and
downstream analytics. Benchmarked performance on physical network devices with software-based (Scapy)
and hardware-based (Ixia) load generator for scalability testing. Used a Kafka consumer to integrate a
REST API with the system for selective packet capture and data sink integrations.
|
|
Software Developer Intern @
Zigma Software
Sep 2021 - Nov 2021 | Erode, India
Built a weigh-bridge management MERN stack web application for a 'trucks and heavy motor vehicles'
weighing company. Programmed a dashboard for visualization of revenue generation metrics and
constructed unit tests. Coordinated meetings with stakeholders for design and performance feedback and
improvement.
|
|
Full Stack Developer Intern @
WebKnot Technologies
Nov 2020 - Dec 2020 | Bangalore, India
Developed UI with ReactJS and wrote REST API endpoints with node.js and Express.js for two different
MERN stack applications. Integrated custom Tensorflow object detection models with Shinobi
(open-source CCTV framework).
|
|
Software Developer Lead @
Journalistic Literature Club, VIT
Feb 2020 - Jan 2021 | Vellore, India
Led with a team of 6 to move the literature club's newsletter segment (The Weekly Edge) online.
Developed and launched a full scale MERN stack web app that facilitated writing, editing and
publishing articles by club members. Tracked and increased reader traffic by 40%.
|
|
Academic Support Volunteer @
Make A Difference
Nov 2020 - Nov 2021 | Vellore, India
Tutored 10th grade English for students in Kasam Shelter attending King's School, Vellore. Planned
and designed a tutoring schedule and tests over the year to prepare students for their final Public
Exam.
|
|
IDunno:
A Distributed ML Inference Task Scheduler System
Designed a distributed job scheduler system for ML inference tasks built on top of 10 Linux VMs from
scratch using Java and Python. Uses a real-time work scheduling algorithm to optimize query rate for
ResNet and ImageNet classification tasks. Includes a distributed data logging service, distributed
group membership protocol and failure detector, a distributed files system.
|
|
WildSprint:
A Fundraising platform for Wildlife conservation via ETH by Team Ambur Biryani
Best Ethereum Powered Project @ DevSpace '21 by CSI, VIT &
Best Dyte Powered Project @ DevSpace '21 by CSI, VIT
A fundraising platform for wild life sanctuaries that allows for donations via ETH. There are
live-streams from high activity sites at the sanctuaries that allow audience to tune in, watch and
donate for a cause. Demo
youtube.
|
|
KYYarn:
A data visualization app for inquiry and product inventory management of a textile yarn trading
company
MERN Stack business intelligence web application that works as a website/catalogue for a yarn trading
entity in Erode, TN, India to receive business inquiries for products. Data collected is used to
create visualizations for business intelligence.
Also designed and implemented an alternative JavaFX frontend.
|
|
DockerMapReduce:
A distributed MapReduce simulation with docker nodes
A MapReduce algorithm for finding frequency of unique words a large volume of text tested on a single
machine with multiple docker containers acting as separate nodes for map and reduce jobs. The
throughput of the task is checked by scaling the data and nodes up and down.
|
|
Interactive Computer Graphics:
Rasterizer, Ray Tracer, WebGL Terrain Modeling, Flight Simulation and Particle Effects
A collection of projects around interactive computer graphics - Implementations of a Rasterizer
and Ray Tracer from scratch that take descriptions of a scene and output graphic images. Also
includes a collection of work with WebGL: creating a 3D Model of an arbitrary landscape terrain using
random fault generation, illumination via diffuse/specular lighting models and erosion simulation. A
flight simulation of said generated terrain. Finally a simulation of particles with physical forces
like momentum, gravity and drag. For more details and demonstrations
of this work.
|
|
RealWorldBugDiscovery :
An Evaluation of Open Source Real World Bugs via Popular Bug Datasets
In the this project, a comprehensive benchmarking study was conducted on the complexity of real-world
bugs in open-source Maven and Gradle projects. Popular bug datasets such as Defects4J, BugSwarm,
Bears, and QuixBugs were utilized to evaluate the quality of test suites generated by automated test
generation tools, specifically Randoop and Evosuite. Code statement coverage for these bugs was
analyzed using tools like Clover, IntelliJ IDEA, and JaCoCo. Suspiciousness scores for code statements
were calculated from the coverage reports to identify lines of code with the highest likelihood of
containing bugs. Results
youtube.
|
|
Data Systems and Machine Learning:
Online Aggregation, HyperLogLog, Apache Arrow-Parquet ASCII Encoder, Flatbuffers and Shared Memory
OLA: Implemented OLA to progressively obtain approximate query results for various Pandas DataFrame operations, such as filtered mean, grouped mean, and grouped sum.
Apache Arrow-Parquet Encoder: Implemented HyperLogLog to estimate the dataset cardinality. Modified the Apache Arrow source code to implement a new ASCII Encoder and Decoder for Parquet, supporting the Integer and Float data types.
Flatbuffers & Shared Memory: Worked with Google's Flatbuffers and Python's shared memory libraries to pass serialized DataFrames between notebook sessions, and performed various operations directly on the serialized DataFrames.
This improves data sharing and manipulation efficiency in Python since leveraging Flatbuffers provides faster serialization and shared memory for inter-process communication.
|
|
Linux Kernel Development:
Kernel Page Fault Profiler, Rate Monotonic Scheduler, Computing Kernel CPU Time
Page Fault Profiler: A kernel-space virtual page fault profiler, monitoring page faults and CPU run times via a virtual memory buffer and delayed work queue at 20 samples per second, and managed process registration through the proc interface.
Rate Monotonic Scheduler: A real-time CPU scheduler as a Linux kernel module based on Rate-Monotonic Scheduling (RMS) for single-core processors, utilizing the Proc filesystem for user space communication.
Computing CPU Time: A Linux kernel module for process management using a doubly linked list and proc filesystem, featuring periodic CPU time updates via a workqueue and timer callbacks.
|
|
A Survey on Internet of Things Security :
Attacks, Solutions, Strengths and Limitations
International Conference on Artificial Intelligence and Machine Vision (AIMV), 2021
Presented a comparative analysis of benchmarks for latest security frameworks from then recent IoT
literature while advised by Dr. Anil Kumar Kakelli. Categorically classified and critiqued existing IoT security frameworks based on their approaches to address the threat of malignant
nodes in heterogenous device networks and general strengths/limitations.
|
|