Posted on

A month ago, I completed my internship at KiXR, where I worked as a Research Intern in their AI Team on R&D for real-time lip sync in speech-driven avatars as part of the Kiksy.ai project.
Over the course of two months, I explored multiple approaches to improve the audio-to-visual pipeline, focusing on speed, accuracy, and scalability. My work involved:
🔹 Benchmarking different alignment and phoneme extraction methods using libraries such as Whisper, Gentle, Aeneas, Phonemizer, and ForceAlign.
🔹 Developing mappings from phonemes to visemes, incorporating a richer set of viseme shapes for more natural avatar animation.
🔹 Experimenting with machine learning approaches for phoneme/viseme prediction: from classical models (Random Forests) to deep learning architectures (GRUs, RNNs, and torchaudio’s Wav2Vec2).
🔹 Handling large-scale data preparation and training, including generating synthetic datasets, optimising preprocessing pipelines, and testing model performance under real-time constraints.
🔹 Gaining hands-on experience with Linux environments, Docker, PyTorch, and ONNX for deployment and optimisation.
This project provided me with the opportunity to work at the intersection of speech processing, real-time animation, and AI systems. Beyond the technical challenges, it also taught me the importance of iterative experimentation, reproducibility, and scalability in applied AI research.
A huge thanks to Mr. Satyabrata Panda, my mentor, for his guidance, and to Mr. Kiran Lakkapragada and the entire team at KiXR for giving me the opportunity to contribute to this cutting-edge work.
On my last day, I presented my work to KiXR’s senior leadership, including the CEO Kavita Jha, CBO Kiran Lakkapragada, CHRO Sirisha L., Product Development Director Rahul CP, and Senior Computer Scientist Satyabrata Panda, and was humbled to receive great appreciation for my contributions.
Excited to continue building in the space of AI, speech processing, and interactive avatars.

Leave a Reply

Your email address will not be published. Required fields are marked *