Transcript
Felix: I am Jessi. I work as a technical leader in a bank in Brazil, Itau Unibanco. I’m a former IT architect in Santander. This is a European bank. Technical leader is just a role. I continue to be a software engineer. I am a volunteer tech lead in a Brazilian NGO. I’m an instructor of the course, Unraveling JavaScript Syntax. I am a community leader and podcast co-host. I love frontend.
Why Solutions?
Let’s start talking about solutions and why. I believe there are two types of solutions. There are solutions that solve problems that we really have, and solutions that solve problems that we don’t have, but we like so much the solution so we think, “This is so amazing. I want to bring this to my project”, but, in fact, you don’t need. When we think about solutions, the first thing is, what is the indicator that I want to pursue? Probably, among a part of the solutions that are using your software maybe you don’t need it. It’s just a senior developer, someone who goes to a conference or reads an article, and says, this tool will solve all my problems. What problems? What indicators? Take this in mind when you watch this presentation. Maybe the solution for your problem is build some internally. Maybe the solution to your problems is just remove some tools. The solution to problems is always add more things. Keep it simple.
Performance, Observability, and Indicators
When we think about performance, what is performance? Performance is how efficiently your web application or website loads and runs, and, of course, optimized for speed and smooth operation to the best possible user experience. Performance is just, give the user the best experience. As engineers, we need to do all these things that I put in the notes. Remember, performance starts possibly with a bad experience. Because this is not, it’s just ego.
Talking about observability. Observability is when we understand the state of our systems from the outside, so this will help you to monitor and understand the performance of the application. This is important because observability will help you to have best performance in your applications. Of course, it depends. Looks like performance and observability in frontend are like complementary things. There are some problems when we have, for example, an engineer with too much focus on observability or too much focus on performance. If you look too much for performance, your application looks like a black box. It’s very fast, but you don’t have an idea how they work. If you put too much observability, your application will be slow and you’ll probably lose your job, but at least you will have metrics.
The main point here is that these two aspects are so critical when you have a frontend application. I know nobody cares about frontend. Everybody just wants to look for backend. No. We in frontend, we are the first impression with the user, with the application. We are very important. Remember this, we need to think about how the frontend needs to be performant, thinking about always the user experience. I talk about the performance and observability, and about indicators. What are indicators? Indicators is what to define, what we pursued, how we follow metrics.
Changes happen every time in frontend, and backend too, but it’s more slow to change. When you write an API, you don’t change this API every month. When we build a frontend, every time a manager says, if we try the color of the button? You go and try the color of the button and so on. The change in the frontend is more often than backend. The SLO and SLA indicators change the most. This is important. It’s not a problem that indicators change, but we need to guarantee the indicators will change with health.
You have metrics. You have logs. You have everything, and your application is just like a huge and enormous parking lot of cars. You have a lot of data, for everywhere. This metric is made at first to be structured to answer questions. I cannot answer questions with lots of information because I can read the information. If you have one or two or three information, and this information will answer the questions, then you define that you need to answer. This is fine. This is enough. We can have all these metrics, but when a problem occurs, none of these metrics have helped you understand the problem. Open this blank. Open the thing, what happened? No one can say.
You put 10 developers inside the war room for 10 hours, and no one says what happened, and you bought 10 tools for observability, and no one knows what happened when something goes wrong. Maybe the solution is to remove some tools and understand what you really need, what your users really need. I give you a curious result from Grafana Labs Observability Survey for 2023. Most companies in the survey use 1 to 5 different tools, and 41%, 6 to 15. Imagine managing 15 different observability tools? This is a reality. This is a reality for the market. We, from software and technology industry, we consume a lot of new tools. Do you have bugs in your application, because I have, with 6, 15 different observability tools? There is something wrong. Let’s keep it simple, what we really need to obtain.
Our metrics must be structured and answer some questions. Initially, your traces maybe just don’t exist, and you’re tracing things with log. For example, I push the log for this API, and take this ID. I take this ID and I will search in another tool to see what happened. Now I have a number of connections, so take this and do another query, and you need to do 10 queries to obtain one answer. In the financial market, one hour is 100 million Reals, pounds, dollars lost. We don’t have this time. What more? Logs. Imagine that I have five APIs that are necessary to create an answer, for example. I pay a card, so for, ‘complete this payment’, I need to call five APIs, and every API has its own structured log. Where’s the problem? When I need to investigate what happened, it’s hard to understand, and I need to call the developer who created this API.
If this developer went away to another company, what do I do? Cry. I don’t need to cry. I have to use the 19, 18, 15 tools that my boss bought for my team for observability and find the answer, and I cannot. This is just the second problem. The other problem is metrics. What about this? We have a slow reaction when something happened, and we are just reactive. The commercial tells me that this x thing doesn’t work. We go quickly, resolve. This is not the way we live the life. We need to have quality of life. Try to change the mentality. We need visual tracing. This is really important. Understand the API 1, 2, 3, 4, this is the problem. You go direct to the right API. Your logs need to be standard. This is a good way to justify a meeting, to reunion all the developers, and say, everyone, let’s talk about one standard way to log.
This is fine, in everything we use the same type of log. Your metrics need to be in real time. What tool do you need to use for this? It’s your choice. In my opinion, these three points are the most important in frontend. We need this. You probably say, but I can look at the Core Web Vitals. I can look in Google Analytics. No, Google Analytics is not for us. It’s for the marketing team to take insights. We need to use the right tool. We need to use log, tracing, and metrics.
What Problems are Solved?
We’re trying to solve three real problems. Data standardization is a real problem, connectors and APIs, and bring the right information to the right person. Why? Standardizing logs and information, we’ll easily group and analyze. This is another important way to think. Why do we standardize logs? Because it’s more easy to analyze it. If there are some machine learning engineers, you’re probably thinking of so many ways to implement this more easily. The next thing is, look at the tools, allow easy connection in your systems before you bring them to your company. You’re here at QCon. You’ll probably hear about amazing tools. How do you connect these tools in your application?
Probably, you have 10 applications and other tools in your company. How do you connect all of them? Because, if it’s not, you need to take one specific developer to take care of this tool. Every time you need some information, you need to bring a meeting. Do not have meetings, we need to think about how to connect everything without the need of the engineer or the owner of the tool. The last one is design information and distribution according to the need of the indicators. We need a specific type of information. Our boss needs another specific type of information.
Sometimes it’s the same information, but the communication of the information will be different. This is necessary too, because how could you take some decisions without data, and you need the data. Where is the data? With the engineering team. Go ask the engineering team. We are focused on work, and someone starts to ask us about a lot of things that we can put in a dashboard and say, this is all your information.
This person will be happy, because they can make the decisions with confidence, because they have data. What more? What I’m talking about to take the right information to the right person. Imagine that I take this dashboard to an engineering team, and my boss says, I need data. You go and bring this enormous dashboard. You say, ok, I don’t solve the problem. Remember, what is the most important thing they need? You just put this in an only indicator. Your boss needs to have a holistic vision in your application, in the whole application of the company. If it’s just one point, it’s more easy to make a decision. This is not easy. This is very hard. When we talk about observability, we need to support the data area. Make friends with data engineers. They are amazing.
Case Study: Micro Frontends
I will show you a case study that I dealt with in one of the banks that I worked at. What is my main problem? My main problem is that I’m from the architecture. What do I do? I build libraries. For example, I build libraries of authentication, a library of cryptography. These libraries are general propose. You don’t need to download a library from the internet, because we are in the bank. We have compliance and security. We don’t allow you to download what you want. You need to use what is inside the bank. I provide these libraries. These libraries need to be very lightweight, and we maintain them internally. It’s more easy to guarantee there is no stop to work. Because when we use something open source, it’s very nice. I love it.
Sometimes the developer chooses a library that has one maintainer, and the last commit is two years ago. I think it’s very safe for me to put in a very critical application? No, it’s not. We prefer to design all we need in-house. It’s more safe. Imagine that I have 1, 2, 3, A, B, C, and they are from different sides. I am an architect and I have to support all these squads. It’s not just three. I probably have 3000 to support. When something goes wrong in some projects, what do I need to do? I have some tracing for the application. No, because the squads just have their own metrics. They don’t have my library metrics. When I have a problem, I need to debug the application from the squad. This is not good, because every time they have a problem, I need to stop, learn all the application, and then start to solve a problem. What solution do we think is good, as architects? We create another library, an interceptor. It’s a very lightweight library.
The purpose of this library is just, we apply in the micro frontend, like all the other libraries, and every time there are service requests from, for example, an API, a BFF, the on shell with the micro frontend is in, and they intercept this connection and send to a server. They hit receive and send it to a server. After doing this, every time we have, “I have a problem”. Let’s see if the problem is with our library. It’s not with our library, so try to see what it is. Let’s see the other stats. This will help us to gain, as a team, more performance, because I just need to open a dashboard and see what happens. This is not a perfect solution, but remember, I work in the bank.
I don’t have a budget, because when we try to bring something, you need a budget. The banks are rich for a reason. They don’t spend money all the way. It’s hard to amalgamate a new tool. If I develop something very simple and cheap, it’s more easy to be adopted. After this, I start to show to all the teams how important it is to use observability. This is a first step, a simple step, but they bring us good answers for the questions. We’re just like, what happened?
Regulations and Compliance
Now I want to reinforce a point about regulations and compliance that we have in banks. Probably you say, no, I don’t work in a bank, so I can just buy a tool, and everything is fine. If you work in a specific project, maybe you’ll be thinking about developing something, if it’s quicker to solve your problem. It looks like everything is fine, everything is good, and you don’t have a problem. Yes, you have. When you start to try to balance performance and observability, you need to deal with complexity, of course, because you need to understand how your application behaves and where the performance bottlenecks might occur. This is a good thing, because we do not lose our jobs to AI agents.
This is very hard, because you need to have a high visualization of your application and understand the things, not just to process data. You need to understand and talk with people, and your boss and your users. I think this brings more complexity in frontend. Security is more complex with backend, in my opinion. The other problem that we have is diverse user conditions. In backend, it doesn’t matter if your user has an old phone or a new phone, but in frontend it matters to us. For example, we have a variety of devices, network connections, variety of browsers. The other problem that is very easy to understand, is data overload.
Thinking about, I have 3000 applications, and I put my library in each one, and I have 3000 servers to manage. I know it’s very hard, but we need to understand how to make all things simple every time. Of course, resource constraints, because when we think, optimization, performance, we need to maintain all the things simple, because your application is legacy code. How do I deal with legacy code? How do I deal with an application whose developers are all juniors? I cannot put something more complex, because they haven’t started to develop because they don’t know how to do this. The word is a box of surprise. We need to be prepared to deal with all this.
Degradation between Frontend and Backend
The main point that I bring here not just about the frontend, not just about performance and observability, but a thing called degradation between the frontend and the backend. This is about discovering the potential of performance degradation, when you realize that people aren’t using things as soon as possible. Imagine you design a great service, but in the frontend, they won’t be used, and you need to discover this as quickly as possible to fix this or improve your user performance. This point between frontend and backend, you need to pay attention. With debug, you cannot observe this. This is possible just with observability. This is a point to when the performance things matter. Because, for example, I have a service that was so quick, but my users don’t think it’s safe to use the service, because imagine, analyzing something, and, at least in Brazil, the user expects to be really analyzing.
In the user’s mind, they need to wait for one second, two seconds. If I do this so quickly, they probably think nothing happened. Sometimes the performance that we need is to slowly serve it, because this way the user will be so much comfortable to use the service. Performance is not always to do things more quickly, but do things to bring confidence to the user of the application.
Perfection is Stagnation
Of course, perfection is stagnation. You see what I show you, that I do in the bank that I work. I probably use a huge tool and have metrics, logs, and so on, but it’s not possible. Sometimes it’s better to start with a simple thing and develop more complex things. For example, now I have a very simple dashboard that shows me just some simple metrics. This is a good start. You need to start from somewhere.
This is another good point, you really know how to use your tools, because this is the same tool as this tool, and it looks much better. It’s the same tool to bring this dashboard. Sometimes you don’t need to buy another tool, you just need to understand how to use well the tools that you already have. Imagine bringing something like this to your boss, and they say, this solves all my problems. When he goes to a meeting with investors, he opens your dashboard, because your dashboard is easy for someone who doesn’t work directly with technology to understand. This is the main power of what we do in technology, empowering people with information to have good solutions when he talks with someone, and have good insights.
Summary
What’s next? I’ll talk about bank reputation. Banks are slow. I know a lot of developers who go out of banks because it’s slow. It’s hard to change things. Everything is legacy code. I need to go to a more modern place. Banks are slow for a reason. They have a reputation in the game. If we have a problem of security, for example, the bank may fail, and this is not what we want. The role of observability data will bring more speed to decisions. Making mistakes cost reputation. If you bring data to your boss, if you bring observability to your application, you will probably give more speed to the decisions and make your life better.
Questions and Answers
Participant 1: Since frontend is not just a synchronous way of communication, there are asynchronous events and all that stuff. When we talk about frontend observability, how are you doing tracing of the frontend request to the backend in case of asynchronous communication?
Felix: When we deal with asynchronous I, as an engineer, the first thing I need to understand, if all the asynchronous use is really needed, because sometimes we have extra troubles because they don’t use asynchronous correctly. The second thing is to understand, what type of metrics do you want to understand, for example? Let me project an example. I have a payment service, and I have an update client service. I need to update the client service when the payment is guaranteed, so I need to just wait until they run the next service. At least in my experience, when we collect logs and bring them in a timeline, this will help us. For example, I built a dashboard with timeline. When we use developer tools, you can have a line to understand the time of the things, and this will happen to me, and I try to do something similar in my dashboard. Because it’s hard to deal with asynchronous operations. At least in my case, this would help.
Participant 2: Do you have any type of recommendations to differentiate between slowness of user connection or user device, and also the latency on the frontend?
Felix: It depends if it’s necessary to the application. What we try to do is to understand the type of users. For example, I can obtain the data of the type of browser, so I have the information of the type of the device, and this helps me to refine the information and classify the users. This is expensive to create, but you can, for example, create a dashboard and classify the users by the type of mobile phone or the type of the browser. This will help to have some idea of this type of thing. Because, for example, 30 clients will be requested slowly, but they use an old phone. Everything is nice. There is nothing wrong. This is why we need to refine the data and understand what are the user applications, because this type of problem can be solved very easily, if you can classify the users more easily.
Participant 3: You spoke about performance, about tracking errors, identifying bugs. I’d like to know your views about how can I use these to track the user behavior, how it interacts with the application, for improving the user experience.
Felix: You can use a RUM agent. It’s a great tool, to see in real time what your users are doing. This will help a lot. If you can use, for example, a tool like Hotjar, and you can see how your users use your application. This has helped me too. Another tool that helps me to understand the behavior of the user is to take data analytics insight, my performance and observability insights, and bring them together to discuss. Some things, at least in my experience, you cannot understand, just read data. You need to talk with people and understand the context and connect things. I think this is the hardest part of the frontend. You need to connect information for market, information for the business person, and information for the software, and then you can support the decisions.
Participant 4: If you had to write from scratch the interceptor library you were mentioning, what will be the first metric you would implement?
Felix: Keep it simple. I will start with logs, because when we start to write something like this, we need to understand, what is our public? My public is developers. It’s more reasonable for a developer to read a log. We start with a simple log. I receive a request, what is inside this request? In the return of the request, what is inside? I need to group these two points: the request enter, and the request out. This is the first thing that we do.
See more presentations with transcripts