Home

My AI Journey

A chronological narrative of my learning, projects, and explorations in the field of Artificial Intelligence.

2018

My journey into Artificial Intelligence began in 2018, sparked by a friend who was taking Andrew Ng's foundational AI course, where the coding exercises were done on the Octave platform. At the time, I was a Senior Project Manager at a human resources service company with a full-time role in operations; building software and writing code were entirely new to me. However, I had a profound and enduring interest in working with data—what is commonly called "Data Analytics"—and was particularly fond of analyzing economic, monetary, and environmental datasets.

While mathematics was a personal challenge, I've always been drawn to difficult problems, viewing the effort to learn as a meaningful investment of time. With a curious mind, a strong background in math, an understanding of data, but zero coding knowledge, I confidently began Andrew Ng's AI course. It started normally, with a few lectures a week. The material quickly became fascinating as I learned about the connection between AI and the functioning of brain neurons, its relationship to perceptrons, and the logic gate—the core of digital electronics, which had been my favorite lecture series during my engineering studies.

I had discovered a field that combined all my interests and promised a path toward a new and meaningful career. I was also completely unaware of its deep connection to neurons and brainwaves, subjects I had always wanted to explore. This made the prospect of learning to program AI feel both fun and easy—little did I know what was ahead. My obsession with the course grew, and I found myself studying day and night, even in cabs, driven by a surprising new enthusiasm. Within a month or two, I had completed the 11-chapter course, though I had done very little actual coding. By then, I was aware that Python was the preferred language for AI, and PyTorch was gaining momentum.

This first, extensive course fundamentally changed my perspective on the future and the immense disruption AI could cause as it matures. When I say AI, I include all of its branches: Reinforcement Learning, Machine Learning, Deep Learning, and other algorithmic advancements. I knew I had to understand the history and the milestones, like the formulation of forward and backpropagation theories, that formed the backbone of AI. I purchased the "Python Data Science Handbook" by Jake Vanderplas and began coding in Jupyter Notebooks. I followed GitHub repositories, ran code locally using the Anaconda environment, and leaned on Stack Overflow. My passion for data visualization made learning libraries like NumPy, Pandas, and Matplotlib the best investment in my journey. Though I didn't finish the book, it provided a solid foundation.

I realized that to master coding AI, I first needed to learn data science—a slight diversion, but necessary, as I have a habit of clarifying the basics before diving deep into a topic. I became active on X (formerly Twitter), following influential figures in the AI ecosystem like Jeff Dean, Jeremy Howard, and François Chollet, along with many researchers and practitioners. I was inspired by people like Jack Kelly, who left Google DeepMind to build cloud nowcasting models to predict sunlight for solar energy generation. Following his project and joining his community taught me a great deal about how AI/ML can improve existing systems, remove roadblocks, and act as a powerful catalyst for innovation.

2019

By the end of 2019, I was deep down the AI rabbit hole. I was taking a Reinforcement Learning MOOC from OpenAI engineers and reading recent research papers from top conferences like NeurIPS, ICML, and CVPR. I followed the work of researchers from renowned institutions such as MIT, Cornell Tech, MILA, and Google. To better understand the complex mathematics, I turned to MIT's OpenCourseWare lecture series and watched paper explanations on YouTube channels like "Two Minute Papers," "Machine Learning Tokyo," and "Machine Learning Street Talk." For coding, I learned Python from tutorials by Sentdex.

I was completing free, uncertified courses on Coursera and edX, and by the end of the year, I was knee-deep in Python, building AI models from scratch. My second book purchase, "Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow" by Aurélien Géron, introduced me to the Google ecosystem. I was struck by how invested Google was in the technology, from providing TPUs on the cloud to developing TensorFlow and publishing foundational papers on Transformers, GANs, and CNNs that paved the way for today's LLMs. Simultaneously, professors like Christopher Manning and Fei-Fei Li at Stanford were doing amazing work on text-based models. Taking some of their free courses formed the basis of my understanding of NLP.

2020

At the start of 2020, my focus shifted to Reinforcement Learning, taking MOOCs from OpenAI researchers and other community experts. My job provided a real-world use case for these new skills. In March 2019, I had transitioned from an operations career to a Product Manager role for a SaaS product that handled background verification for employees in India. The sheer volume of records in India made cross-validating personal IDs and court cases a significant challenge.

In 2020, I was asked to work with the R&D team to build a face verification system that compared a user's selfie with the photo on their ID card and matched personal details from the RFID. We successfully developed the system, which required human oversight for results below a certain confidence threshold, and soon turned it into a product for other clients.

My study of Reinforcement Learning, including concepts like control policies, feedback mechanisms, and agent environments, strengthened my belief that AI would be the catalyst for the next industrial revolution. This conviction led me to my third book, "Deep Learning" by Ian Goodfellow, Yoshua Bengio, and Aaron Courville, famously recommended by Elon Musk. The book was dense with the higher-level mathematics core to building and understanding deep learning models. During this period, the lines between Machine Learning, Deep Learning, and Reinforcement Learning were blurry, which made for a fun and dynamic learning environment. It was also when I truly grasped the critical importance of data labeling. High-quality, use-case-specific data was rare, giving rise to billion-dollar companies like Scale AI and teaching me that the quality of the training data ultimately determines the quality of the model.

2021

As I began to understand how models learned, everything became clearer and more interesting. I was consuming information from hundreds of websites, GitHub repositories, and research papers weekly. I realized my next step was to get closer to the action. By the start of 2021, I was convinced I should immigrate to Canada, home to some of the world's best universities and AI research labs. My plan was to pursue a Ph.D. or, failing that, find a role in a research lab. Life, however, had other plans. I applied for and received my permanent residency, landing in Canada in October 2021.

On a different note, before moving, I was also co-founding and running a small D2C FMCG brand in India. This entrepreneurial venture taught me how to build a company from the ground up and the immense difficulty of scaling. None of us had a background in FMCG or sales, which made the challenge even greater. I was involved in everything from building and maintaining the website to establishing sales, marketing, and shipping channels, all while working on product development.

2022

Life, as it often does, took a different turn. Driven by an interest in the privacy implications of Virtual Reality, I joined a U.S.-based non-profit. I began helping the founder build and run programs, conduct research, and advance a mission I believed in: safeguarding society from the risks of future technologies. My understanding that "data is the new oil" made it clear that data management and governance would be foundational for any company using AI/ML.

I became deeply engaged in my new role and new country, and for a time, I lost track of the rapid progress in AI. That changed when ChatGPT 3.5 was launched in November 2022. I learned about it in December, and by January 2023, I was using it to summarize transcripts from panels and keynotes at work. I trained a small team on prompt engineering, and with the help of a few volunteers, we summarized 45 session transcripts in a matter of days.

2023

I was amazed by the power of Large Language Models (LLMs) and what OpenAI had unleashed. I had been following their research since 2018, from their GAN research to MusicNet, but the capabilities demonstrated by their text models were on another level. Their intentions became clear when Google released the "Attention Is All You Need" paper, and the Transformer architecture was born. OpenAI, once a non-profit, was now at the center of a revolution, and I was excited by their success and the accelerating race to build Artificial General Intelligence (AGI).

Soon, other companies utilized the same architecture to build image-based models, and Midjourney was born. Microsoft, a long-time researcher in the field, forged a partnership with OpenAI and launched Copilot, first integrated into the Edge browser. I used it extensively to summarize webpages and format text, realizing how much mental workload could be offloaded to AI. Meanwhile, generating images with Midjourney for free was only possible through their often-overloaded Discord servers.

In February 2023, Meta launched the first major open-source LLM, LLaMA, sparking a global debate on how these powerful models should be built and governed. The race to build bigger and better models had officially begun. When OpenAI launched GPT-4 just six months after GPT-3.5, it was clear to everyone that the race to AGI was on. Amid the excitement, fears of existential risk grew. Elon Musk, Geoffrey Hinton, and other prominent figures signed an open letter urging frontrunners like OpenAI to slow down and focus on understanding their systems. They had a valid point—explainability was a major challenge, and LLMs were, and still are, a black box.

This is also the period when the ethics of data collection came under scrutiny. People realized some companies were scraping the internet unethically for training data, leading to copyright disputes. OpenAI was publicly called out for its data practices, and Meta was known to have used public data from Facebook and Instagram to train its models. Another major problem that quickly became apparent was hallucination. LLMs would behave strangely in long conversations, providing false information and diverging from the topic. A lawyer in New York famously used made-up case precedents from ChatGPT in a court hearing.

To combat hallucinations, the industry shifted towards Retrieval-Augmented Generation (RAG). In a RAG system, an LLM retrieves relevant information from a curated knowledge base (like a vector database of documents) to formulate a factually grounded answer with citations. While open-source frameworks like LlamaIndex aimed to help developers implement RAG, companies like Cohere offered enterprise-grade solutions.

Following the launch of GPT-4, the community began building agentic frameworks like AutoGPT and BabyAGI, designed to allow autonomous agents to work collaboratively. I shared this dream and went down a spiral of learning about these systems. By the end of 2023, OpenAI launched its Assistants API, and the global race intensified as China's tech giants like Alibaba and Baidu released their own models.

Towards the end of the year, Google officially entered the LLM race with its Gemini family of models, with Gemini 1.0 Pro being the first major multimodal model. This period wasn't without controversy, as images generated by Gemini sparked a debate on bias—an issue that persists but has largely been addressed. This crazy frenzy was fueled by hardware advancements, particularly NVIDIA's H100 GPUs and Google's TPUs. NVIDIA's innovations led its market value to cross a trillion dollars. With the rise of massive AI data centers, environmental concerns grew, though I was confident in the net-zero commitments made by major players like Google and Microsoft.

2024

By the start of 2024, I had integrated GenAI tools into my daily workflow for everything from online research and project planning to writing assistance and IT troubleshooting. I used a suite of products like Glasp for summarizing YouTube scripts, Gamma for building presentations, and NotebookLM, Google's RAG tool. New text, audio, and image models were released in rapid succession. The world was taken aback when OpenAI previewed Sora, its text-to-video model. Soon after, Google launched Veo, and other labs released models like Kling and Dream Machine.

Anthropic launched Claude 3 and, later, Claude 3.5, which became known for their precision and safety, exhibiting fewer hallucinations. After extensive parallel testing, I found myself preferring Claude's tone and presentation, eventually switching to its paid version, with Claude 3.5 Opus becoming my most used model. The feature race was on, with companies launching new capabilities to attract users. Anthropic's "Projects" feature, their answer to RAG, was a key factor in my switch. Google's NotebookLM had its own viral moment when it introduced a feature to convert source documents into a podcast—a truly phenomenal tool.

It became evident that the time between model launches was shrinking, and conversations began about hitting training data limits. The answer seemed to lie in building better reasoning models that used techniques like Chain-of-Thought (CoT). OpenAI shook the world again with GPT-4o, its first truly multimodal model, which quickly became its best offering for paid customers. These advanced reasoning models became the new frontier. News of a Chinese company releasing an open-source reasoning model, allegedly by replicating OpenAI's techniques, highlighted how the gap between closed and open-source models—and between the US and China—was narrowing.

The year also saw the launch of Devin, the world's first AI software engineer. While GitHub Copilot offered code completion, Devin promised autonomous task completion. This space was further bridged by agentic IDEs like Cursor, which allowed users to generate entire code segments from a prompt. Many platforms like V0 and Replit emerged, showcasing the power of agents to build MVPs and applications. However, I learned that building a production-ready agentic application is still incredibly difficult and requires a deep understanding of the tech stack and best practices.

I was excited by the potential for organizations where humans and agents could work collaboratively. I discovered David Shapiro's YouTube channel, where he and his community were researching the socio-economic impacts of AI, including post-labor economics and Universal Basic Income (UBI). His Autonomous Cognitive Entities (ACE) framework was one of the first to address the safe integration of autonomous agents.

By the end of the year, I began learning to code agentic applications myself, choosing the CrewAI framework for its gentler learning curve over LangChain. It became evident that prompt engineering and evaluation would be the backbones of building these applications, with humans playing a crucial role in the loop. The year concluded with regulatory developments, as the EU's AI Act received final approval, setting a global precedent for AI governance.

Let’s Connect

Interested in working together or discussing AI? I’d love to hear from you.

GitHub LinkedIn Email