Key Takeaways
- GenAI can enhance employee productivity while safeguarding data security with data redaction and locally-hosted models.
- Centralizing tools and aligning them with user behavior is critical for success.
- Adopting trends like multimodal inputs and open standards can future-proof AI strategies.
- Not all GenAI bets will pay off, so be deliberate with GenAI strategy and focus on business alignment.
- GenAI has evolved from the initial hype to practical application and the “slope of enlightenment”.
On November 30, 2022, OpenAI released ChatGPT. That release changed the way the world understood and consumed Generative AI (GenAI). It took what used to be a niche and hard-to-understand technology and made it accessible to virtually anyone. This democratization of AI led to unprecedented improvements in both innovation and productivity in many fields and business roles.
At Wealthsimple, a Canadian financial services platform on a mission to democratize financial access, there is excitement around the potential of GenAI. In this article, which is based on my talk at QCon San Francisco 2024, I will share some of the ways we’re leveraging GenAI to enhance productivity and the lessons that came out of it.
Our GenAI efforts are primarily organized into three streams. The first is employee productivity. This was the original thesis of how we envisioned LLMs could add value and it continues to be an area of investment today.
As we started building up the foundations and tools for employee productivity, this gave us the confidence to optimize operations, which became our second stream of focus. Here our goal is to use LLMs and GenAI to provide a more delightful experience for our clients.
Third, but certainly not least, there’s the underlying LLM platform, which powers both employee productivity and optimizing operations. We developed and open sourced our LLM gateway, which, internally, is used by over half the company. We developed and shipped our in-house personally identifiable information (PII) redaction model. We made it simple to self-host open source LLMs within our own cloud environment as well as train and fine-tune models with hardware accelerations.
LLM Journey – 2023
The first thing that we did in 2023 was launch our LLM gateway. When ChatGPT first became popular, the general public was not as aware of third-party data sharing as it is today. There were cases where companies were inadvertently sharing information with OpenAI, and this information was then being used to train new models that would become publicly available. As a result, many companies chose to ban employees from using ChatGPT to prevent this information from getting out.
At Wealthsimple, we believed in the potential of GenAI, so we built a gateway that would address security and privacy concerns while also providing the freedom to explore. The first version of our gateway did one thing: it maintained an audit trail. It would track what data was being sent externally, where it was being sent, and who sent it.
The gateway was available for all employees and it would proxy the information from the conversation, send it to various LLM providers such as OpenAI, and track this information. From a dropdown, users made selections among the different models to initiate conversations. Our production systems could also interact with these models programmatically through an API endpoint from our LLM service, which also handles retry and fallback mechanisms.
After we built the gateway, we ran into a problem with adoption: there wasn’t that much incentive to use it. Our philosophy at Wealthsimple is that we want to make the right way the easy way. We used a series of sticks and carrots to improve adoption, with an emphasis on the carrots.
One of the benefits of our gateway is we made it free to use: we paid all of the API costs. Second, we wanted to create a centralized place to interact with all of the different LLM providers. At the beginning, it was just OpenAI and Cohere, but the list expanded as time went on.
We also wanted to make it a lot easier for developers. In the early days of interacting with OpenAI, their servers were not the most reliable, so we increased reliability and availability through a series of retry and fallback mechanisms, and we worked with OpenAI to increase our rate limits.
Alongside those carrots, we had some very soft sticks. The first is what we call nudge mechanisms. Whenever anyone visited ChatGPT or another LLM provider directly, they would get a gentle nudge on Slack saying: “Have you heard about our LLM gateway? You should be using that instead”. We also provided guidelines on appropriate LLM use which directed people to leverage the gateway for all work-related purposes.
Although the first iteration of our LLM gateway had a great paper trail, it offered very few guardrails and mechanisms to prevent data from being shared externally. But we did have a vision centered around security, reliability, and optionality. We wanted to make the secure path the easy path, with the guardrails to prevent sharing sensitive information with third-party LLM providers.
Guided by this vision, the next thing we shipped in June of 2023 was our own PII redaction model, which could detect and redact any potentially sensitive information prior to sending to external LLM providers. For example, telephone numbers are recognized by the model as being potentially sensitive PII, so they are redacted.
Figure 1: PII Redaction
While this closed a gap in security, it introduced a different gap in the user experience. Many users reported that the PII redaction model was not always accurate, which often interfered with the relevancy of the answers provided.
Secondly, for them to effectively leverage LLMs in their day-to-day work, they needed to be able to use some unredacted PII, because that was the data they worked with. Going back to our philosophy of making the right way the easy way, we started to look into self-hosting open source LLMs.
For self-hosted LLMs, we didn’t have to run the PII redaction model. We could encourage people to send any information to these models, because the data would stay within our cloud environments. We spent the next month building a simple framework using llama.cpp, a quantized framework for self-hosting open-source LLMs.
Next we introduced a very simple semantic search as our first RAG API. We encouraged our developers and our end users to build upon this API and other building blocks we provided in order to leverage LLMs grounded against our company context.
Even though many of our users asked for grounding, and it intuitively made sense as a useful building block within our platform, the engagement and adoption was actually very low. We realized that we probably didn’t make the user experience easy enough. There was still a gap when it came to experimentation and exploration. It was hard for people to get feedback on the GenAI products they were building.
In recognizing that absence of feedback, one of the next things that we invested in was our data applications platform. We built an internal service using Python and Streamlit. We chose that stack because it’s easy to use and it’s something many of our data scientists were familiar with.
This platform made it easy to build new applications and iterate over them. In a lot of the cases, these proof-of-concept applications expanded into something much bigger. Within just the first two weeks of launching our data application platform, we had over seven applications running on it. Among those seven, two eventually made it into production where they’re adding value and optimizing operations and creating a more delightful client experience.
As our LLM platform came together, we also started building internal tools that we thought would be very powerful for employee productivity. At the end of 2023, we built a tool we called Boosterpack, to provide employees with a personal assistant grounded against the Wealthsimple context.
Boosterpack allowed users to upload documents to create knowledge bases, either private or shared, with other users. Once the knowledge bases were created, users could leverage the chat functionality to ask questions about it. Alongside the question-answering functionalities, we also provided a reference link to the knowledge source. This reference link addition was really effective at providing fact check or further reading, especially when it came to documents as a part of our knowledge bases.
LLM Journey – 2024
2023 ended with a lot of excitement. We started the year off by introducing our LLM gateway, introducing self-hosted models, providing a RAG API, and building a data applications platform. We ended the year by building what we thought would be one of our most useful internal tools ever. We were in a bit of a shock when it came to 2024.
Gartner’s hype cycle maps out the evolution of expectations and changes when it comes to emerging technologies. This is very relevant for GenAI, because in 2023, most of us were entering the peak of inflated expectations.
We were so excited about what LLMs could do for us and we wanted to make big bets in this space. But as we entered 2024, it was sobering for us as a company and for the industry as a whole: we realized that not all of our bets had paid off. We then evolved our strategy to be a lot more deliberate, focusing on the business alignment with our GenAI applications. There was less appetite for bets.
The first thing we did as a part of our LLM journey in 2024 was un-shipping something we built in 2023. When we first launched our LLM gateway, we introduced the nudge mechanisms, which were the Slack reminders for anyone not using our gateway.
Long story short, it wasn’t working. The same people were getting nudged over again, and they became conditioned to ignore it. Instead, what we found was that improvements to the platform itself were a much stronger driver for behavioral changes.
Following that, we started expanding the LLM providers that we supported. The catalyst for this was Gemini. Around that time, Gemini had launched their 1-million-token context window models, and we were really interested to see how this could circumvent a lot of our previous challenges with the context window limitations.
A big part of 2024 was about keeping up with the latest trends in the industry. In 2023, a lot of our time and energy were spent on making sure we had the best state-of-the-art model available on our platform. We realized that this was a losing battle, because the state-of-the-art models were changing every few weeks. Instead of focusing on the models, we took a step back and focused on higher-level trends.
One emerging trend was multimodal inputs: forget about text, now we can send a file or a picture. This trend caught on really quickly within our company. We added a feature within our gateway allowing our end users to upload either an image or a PDF, and the LLM would then drive the conversation from those inputs. Within the first few weeks of launching this tool, nearly one-third of our end users started leveraging a multi-modal feature at least once a week.
One of the most common use cases we found was when people were running into issues with our internal tools. As humans, if you’re a developer, and someone sends you a screenshot of their stack trace, that’s an antipattern: you would prefer to get the text version.
While humans have very little patience for that sort of thing, LLMs embraced it. Pretty soon, we were seeing behavioral changes in the way people communicate, because the LLM’s multimodal inputs made it so easy to just paste a screenshot.
Figure 2: Sending an error screenshot to an LLM
Figure 2 shows an example of an error someone encountered when working with our BI tool. This is a fairly simple error. If you asked an LLM, “I keep running into this error message while refreshing MySQL dashboard, what does this mean?” The LLM actually provides a fairly detailed explanation of how to diagnose the problem (see Figure 3).
Figure 3: The LLM Explains an Error Message
After supporting multi-modal inputs, the next thing we added to our platform was Amazon Bedrock. Bedrock is AWS’s managed service for interacting with foundational large language models.It also provides the ability to deploy and fine-tune these models at scale. There was a very big overlap between everything we had been building internally and what Bedrock had to offer.
We had considered Bedrock back in 2023, but decided instead to build these capabilities ourselves. Our motivation at that time was to build up the confidence and know-how internally, to deploy these technologies at scale.
2024 marked a shift in our build-versus-buy strategy. We’re certainly more open to buying, but we have some requirements: security and privacy, first; price and time to market, second…
After adopting Bedrock, we turned our attention to the internal API that we exposed for interacting with our LLM gateway. When we first shipped this API, we didn’t think too deeply about what the structure would look like, which ended up being a decision we would regret.
Because OpenAI’s API specs became the gold standard, we ran into a lot of headaches with integrations. We had to rewrite a lot of code from LangChain and other libraries and frameworks because we didn’t offer a compatible API structure.
We took some time in September of 2024 to ship v2 of our API, which did mirror OpenAI’s API specs. We learned that as the GenAI industry matures, it’s important to think about what the right standards and integrations are.
Lessons
Over the past few years, we’ve learned many lessons. and we gained a better understanding of how people use these tools and what they use them to do.
There is a very strong intersection between GenAI and productivity. In the surveys and the client interviews we did, almost everyone who used LLMs found that they significantly increased or improved their productivity.
Our internal usage was almost exclusively in three categories:
- Programming. Almost half of the usage was some variation of debugging, code generation, or just general programming support.
- Content generation or augmentation: “Help me write something. Change the style of this message. Complete what I have written”.
- Information retrieval. Much of this was focused around research or parsing documents.
We also learned a lot of lessons in behavior. One of our biggest takeaways this year was that, as our LLM tooling became more mature, we learned that our tools are the most valuable when injected in the places we do work, and that the movement of information between platforms is a huge detractor. Having to visit multiple places for GenAI is a confusing experience, and we learned that even as the number of tools grew, most people stuck with using a single tool.
We wrapped up 2023 thinking that our Boosterpack tool was going to fundamentally change the way people use GenAI. That didn’t really happen. We had some good bursts in adoption and some good use cases, but it turned out we had actually created two different places for people to get their GenAI needs. That was detrimental for both adoption and productivity.
The lesson here is that we need to be a lot more deliberate about the tools we build, and we need to put investments into centralizing these tools. Regardless of what users said they wanted, the way they use these tools will often surprise us.
GenAI Today and in the Future
Wealthsimple really loves LLMs. Across all the different tools we offer, over 2200 messages are sent daily. Close to a third of the entire company are weekly active users. Slightly over half of the company are monthly active users. Adoption and engagement for these tools is really great. At the same time, the feedback that we’re hearing is that it is helping employees be more productive.
Furthermore, the lessons we learned and the foundations that we developed for employee productivity pave the way to providing a more delightful client experience. These internal tools establish the building blocks to build and develop GenAI at scale, and they’re giving us the confidence to find opportunities to help our clients.
Going back to the Gartner hype chart, in 2023 we were climbing up that peak of inflated expectations. 2024 was a little bit sobering as we made our way down. As we’re headed into 2025, I think we’re on a very good trajectory to ascend that “slope of enlightenment”. Even with the ups and downs over the past two years, there’s still a lot of optimism, and there’s still a lot of excitement for what next year could hold.