Rachel Lim is an Engineer on the inference team at OpenAI, where she works on the large-scale systems that enable AI tools to run efficiently and reliably. A Singaporean, Rachel is based at the company’s headquarters in San Francisco. OpenAI’s Asia Pacific hub in Singapore works with local partners and businesses to expand access to AI and support its growing adoption across industries, reflecting the region’s increasing role in shaping how AI is used globally.
1. Explain to us what you do within OpenAI and its inference team.
As an engineer in OpenAI’s inference team, I primarily work on AI systems. The inference team makes it possible for people to use the models and capabilities developed by research teams at OpenAI by making them more efficient and reliable over time. For example, each time you make a request on ChatGPT, Codex (our agentic coding tool), or our application programming interface (API), there are billions of mathematical operations running to generate every single token that is produced. Our job is to make sure that runs as efficiently as possible on the computers we have.
Both research and deployment involve running a large number of computers in a coordinated way, and oftentimes, doing things well requires understanding the machine learning (ML) operations happening under the hood. It sounds complicated, but a lot of the principles underlying this are simple and are things that come up in our day-to-day life. For instance, it’s like going with your family to the hawker centre and splitting up so that you can order three dishes from three different stalls at the same time — this is parallelism. Or sometimes, if you’re at a restaurant and are in a rush, you’ll ask for the bill before you’re done eating because you know it’ll take the staff time to prepare it, so you can leave as soon as you’re done eating— this is pipelining. Running all of the computation behind a model is similar, except the parallelism and pipelining are done across the hardware that we use. In a typical day at work, I often look at dashboards to understand how our systems are doing — whether they’re performing as expected or if things are slow because of errors — thinking about problems and how to solve them, discussing ideas or project milestones with collaborators, and reading and writing (or even deleting!) code. Thankfully these days, Codex writes a large amount of code for me.