Transcript
Mariia Bulycheva: My name is Mariia. I’m a Senior Machine Learning Engineer at Intapp, where on my daily basis I mainly focus now on building agentic AI, but agentic AI is not the topic of today’s presentation. Today I’m going to talk about the work that I’ve done at my previous employer, which is Zalando, where we developed and trained a graph neural network to improve our existing recommender system, to make it more personalized, to optimize for longer-term metrics, and basically improve our system.
Scope
The agenda for today’s talk. First of all, I will describe how we can set up a recommender system on a graph. How many of you are familiar with machine learning concepts overall, and with graph neural networks in particular? We’ll set up a recommender problem on a graph. Then I will describe how we prepare the data, how we convert tabular data into a graph, what specific differences graph representation brings. Then I will go deep dive into the training process on a graph, which is also very different from training on tabular data. I will describe training process pitfalls that we’ve encountered and how we tried to roll it out into production, and which obstacles were there. I’ll present our offline evaluation results, and conclude with suggested future steps, if you’re interested in this topic.
Recommendation Task as a Problem on a Graph
First of all, problem statement. How many of you know Zalando actually, and what we do? Zalando is an online shopping platform. You can think of it as European version of Amazon. Basically, people come to Zalando to buy shoes, clothes, lifestyle products, beauty products. The user journey, of course, starts from the landing page of the website or opening page of the application. Because it’s the starting point of the user journey, it’s very important what kind of content we show there. We want the content to be highly relevant and highly engaging so that the user is interested with what they see on the landing page so that they continue clicking, because otherwise they will just drop out and find it useless, and this will be a bad user experience.
Here on this slide, you can see different types of content that we show on the landing page. It can be specific shoppable items like a pair of shorts. It can be a carousel of different similar products, like a carousel of sneakers or jeans products. It can also be different videos. We have videos that are curated specifically by Zalando, which presents some video coverage of special pair of shoes or special beauty products. We also have videos featured by our creators. You can subscribe to be a creator for Zalando, then you get shipped different products and you create looks, record videos, post them online, and then users can basically shop for these looks. All these pieces in Zalando we call content. When I say the word content, it’s quite broad. It can be a static image. It can be a carousel of images. It can be a video. What’s important is that every piece of content is associated with a particular article. In e-commerce, we call them SKU. Yes, like a pair of shorts would be an SKU or an article.
Our task is we have the algorithm that selects around 2,000 of these content pieces. Then our algorithm has to score these 2,000 pieces and choose only 40 top most promising ones to show them to the user. We need a scoring model that would rank them, these pieces of content, according to the probability of the click. Because if a piece of content generates a click, it means continuation of the user journey. If it’s a sponsored content, which is basically the ad, that’s a direct revenue for the company. It’s very important.
Where we were at the point when we started thinking about graph neural networks, we actually already had a quite well-performing system. What we stumbled upon is that we could not improve it. It was running online, training quite well. Of course, you want to improve your system. We wanted to actually steer towards longer term engagement and long-term metrics, such as user retention or final purchases. We could not do it with classic deep learning models. This is when we started thinking about graphs, because actually, graph is conceptually quite different from standard tabular data way of thinking. The reason why we thought about graph neural networks for our case is, first of all, that user engagement on any digital platform is very well presented as a graph. You can think of users and content as nodes.
Then links would be different kind of interactions between user and content. If you’re talking about Spotify, you can think of play or skip. This would be different kind of links. Then tracks would be your nodes and users would be your nodes. In case of Zalando, of course, it’s different shoppable items and users. Different actions would be clicks. Or we can go also beyond clicks, like add to cart, add to wishlist, and so on.
Another important thing about graphs is that they explicitly model higher order interactions and higher order relations, because on a graph, you can track multi-hub connections. Things like, for instance, friends of friends on Facebook. Or items that are often bought together on an e-commerce platform. Graph gives you this possibility of tracking how different items are connected through multiple links. You can also embed more data into your graph, because you can add features to your nodes.
For instance, you can add user demographics data as node features. You can train separately image representations and use them as content node features. This is what we actually have at Zalando. We have a separate pipeline that basically creates embeddings, which are projections of articles into a latent space, where articles that look similarly would be close together. Basically, this is just like a visual similarity. This is what’s important, that these embeddings are only intrinsic characteristics of the articles. They do not represent anything about how they’re actually connected to the user: how frequently they’re bought, how frequently they’re viewed. We can have, for instance, two red dresses, like long sleeves, knee length, which will have totally different prices and totally different purchasing history, but in this latent space of images, they would be very similar. What’s important about training graph embeddings is that there, you would actually see their contextual representation. How they are connected to the users and other pieces of content.
Last but not least, like I already mentioned it a bit before, we’re talking about heterogeneous graphs. We can have multiple types of nodes and multiple types of links. We can go beyond just user and content nodes. We can talk about brand nodes, because sometimes a user specifically can indicate that they like a particular brand. There’s this option, follow, on Zalando, where it’s a direct signal from the user that they like this brand. Then it can be embedded into the graph. Last but not least, links can have weights on a graph, which for us means that we can actually model such things as recency or freshness of connections. Or for instance, for videos, we can model as a weight of the watch link, we can model the watch rate. The weight would represent how long the user watched a particular video. Recommendation system on a graph.
As I already said, here you can see these orange nodes are users, the purple nodes is content. What we want to do, we have these gray links, connections in the past. We have data about user interactions in the past, like on which content they clicked, which content they viewed. We want to predict these connections in the future. More specifically, we want to predict clicks. We want to predict clicks given view. If we show this piece of content to the user, how high is the probability that they will actually click on it?
Data Preparation
If you want to train a graph neural network, first thing you need to think about is, how do you convert your existing data into a graph, if it’s not already a graph database? Usually if you’re at the starting point, you need to first think how you create this graph database. In this case, we did not specifically create a graph database per se, because there are solutions that allow to do that, like Neo4j. What we did is we prepared this graph basically as a separate pipeline before training. What we do, we store user logs as a table. Because when a user comes to the homepage, they’re presented with 40 different pieces. We have 40 rows of user and a particular content piece.
Then we have two labels, view and click. View is marked as 1, if a content piece appeared in the viewport for at least 3 seconds, it means that there was a successful view. We registered that the user actually viewed this item, either scrolled down or maybe it was at the top of the viewport initially. Then another label is click. Click can, of course, happen only if there was a view. If the user clicked on the item, then we register label 1. If not, it’s label 0. Very important thing with graphs is that train and test graph have to be fully disconnected. With tabular data, it’s obvious. You take just row by row, you separate training rows from test rows, and you’re done. With a graph, if initially you have one single graph, it’s very hard to dissect the training data. What we did in our case, we basically prepared the train graph and the test graph separately. We take seven days of user activity for the train graph and one consecutive day of user activity for the test graph.
As I said, two types of nodes, client IDs and entity IDs. Initially, we had two types of links, viewed and clicked. Then to simplify our modeling approach, because we’re only interested in predicting clicks given a view, so we only then decided that we leave the viewed links and throw away the clicked links. The viewed links would have labels 0 and 1. A view link would have label 0 if there was no click given this view, and it would have a label 1 if there was a click given this view. It allowed us to simplify modeling but achieve in the end what we wanted.
Here are some stats about training and test graph, just to give you some idea how large of the datasets we’re talking about. For these seven consecutive days, we have around 5 million different clients coming online. Of course, they can come back multiple times, but we’re talking about distinct nodes and distinct clients. We have around 12,000 entity pieces that we, during this time, managed to show on the homepage. We have more in our library, but it means that this is how much we show and people actually engage with.
Then around 20 million viewed edges and 1.5 million clicked edges. This would mean we have 20 million edges overall, and then 1.5 million would have label 1 and the rest would have label 0. The test graph, which represents data for only one day, of course, has less client and entity nodes. You can see 1 million clients, 7K entities, 3 million views, out of which around 200 clicks. Then all these nodes actually have features, which makes, of course, the graph even larger. Every node has a 25 by 128 feature matrix. I will explain where this matrix comes from. For the user, we take 25 last purchased items and we take embeddings for these items. These image embeddings I was talking about before.
Basically, projection to this latent space where similar images are close together. 25 latest purchased items for the users, and then for the content, 25 associated items with this piece of content. If it’s less than 25, then it’s just padded with zeros. If it’s just one SKU, it would be one embedding and the rest would be zeros. The dimension of the embedding is 128. This is where this 25 by 128 dimension comes from. We experimented with two libraries, Deep Graph Library, which is based on TensorFlow, and PyTorch Geometric, which is based on PyTorch. If you are at a starting point of experimenting with graph neural networks, I would highly recommend Deep Graph Library because it’s more low-level. It forces you to go deeper into understanding how a graph neural network actually works, because in PyTorch, a lot of it is already implemented, and then if you use it, you basically don’t really see different pitfalls of a graph neural network.
First, we started with Deep Graph Library and I found it actually very useful. Then, of course, it turned out that at some point it gets too complicated, so we switched to PyTorch Geometric. Although our downstream system runs on TensorFlow, it was actually no problem. Everything combines well together. We used PyTorch Geometric for converting data into a graph and for training the graph neural network itself also. How data preparation works is basically you first convert your data into tensors and then these tensors are converted into this HeteroData data structure, which is a special data structure in PyG library.
Training Process on a Graph
Training process on a graph, which is the most interesting part. Here you can see the overall architecture of a GNN trained end-to-end to predict the click, to predict that the user, which is the pink box on the left, will click on the content, which is the yellow box on the bottom left. Then the system spits out the probability that they will click on this content. First layer is just features pre-processing. Like I said, every node has this 25 by 128 matrix, which we want to somehow pre-process at the beginning. We experimented with that. For users, we found that LSTM layer is actually working quite nice, but you don’t have to do that. You can do simple mean or max pool like we did, for instance, for the content nodes.
Then the most important part is in the middle. Here we have three GNN layers. You can have more, you can have less. I would say this is a hyperparameter you experiment with. These three GNN layers, you can think of them as, for instance, convolutional layers in a convolutional neural network, if you’re more familiar with that. You put your features through these layers. Then basically every node gets this new numerical representation, which we call user and content embedding, which then you can put through some simple classifier like a dot product, sigmoid, and you get the probability of the click link. How probable the user will click on this specific piece of content.
Now let’s go into a bit more detail about what’s happening in these three GNN layers. What is happening in these magic layers? First, we have initialization. Every node has this matrix of features. Important thing is that we do not store these features in the graph. We populate them on the fly while training. We only store indices to these 128 vectors so that the graph is not so heavy, it’s more lightweight. We populate these features on the fly. Then what happens next is we do sampling of the neighbors. Conceptually, what is happening in the GNN, you have the node and you have different other nodes that are connected to this node via links. You want to pass the features of these neighboring nodes to your node of interest.
Then what happens, you can pass the features from all of the nodes, but this will grow exponentially. If you take the first neighborhood, you take all the nodes, but then you take the second neighborhood. You want to do some sampling. In our case, we just did random sampling, but of course you can go beyond that and be more smart about it. You aggregate the features of these neighboring nodes, pass them through trainable matrices to the node in the middle, concatenate with your initial features. This is basically how you get new embedding of this node, which you then can dot product and get eventual probability of the click. This is probably easier to understand looking at this picture. Here I’m considering just two layers for simplicity, so image on the left, step one, random sampling. Purple node in the middle is the one we’re interested in, the one for which we are computing its embedding. You can see these dotted circles, which are first neighborhood and second neighborhood. Purple nodes are the ones that we sampled via our random sampling strategies. White ones are the ones that we will not consider. Step two, two message passing layers, two GNN layers, two neighborhoods, all the same things, everything in the same quantity.
First, we start from the most distant nodes, because eventually we want to end up in our middle node. We start message passing from the second hop away, which is these purple nodes. Purple nodes pass their features. When I say pass, I mean that they go through these trainable matrices, which during training adjust and figure out the best coefficients for them. Pass them to the brown nodes, concatenate. Brown nodes, again, pass them through other trainable matrices, concatenate with features of the purple node, dense layer, and then you get this x_1, x_2, x_n embedding of the purple node.
This is basically the idea of what is happening inside the GNN. It trains contextual embedding. Embedding is a very buzzword right now. People talk about embeddings all the time. There are embeddings of different kinds. You can train an embedding which will represent my intrinsic features, my intrinsic values, but you can also train an embedding which will represent how I’m connected in society. This you would think about connectivity and how users and content are connected to other users and content. These are contextual embeddings. Like I said, what happens next is basically we take a dot product and predict the click.
Another very important thing and very different for graph neural networks is, how do you batch data on a graph? Because batching tabular data is clear. You just take 256 rows after another 256 rows and so on. Batching images is also clear. You shuffle them, you batch images, and then you start striding over the image. How do you batch the graph? Depending on what kind of task you’re solving, you can either batch links or you can batch nodes. Makes sense. In our case, we’re solving a link prediction task because we are predicting clicks, link clicks, clicked links. It makes sense to batch links.
In PyTorch Geometric, there is a very nice class that already has everything implemented, it’s called LinkNeighborLoader. What it does, it batches links. When I say link in the graph, it is stored as it has an index, and then it has the features that we associate with these links and the adjacent nodes that we associate with this link. This class allows to batch the links, and also with the links, it batches the adjacent nodes. We’re interested not only in adjacent nodes, we’re also interested in the neighborhoods, because we want to sample nodes from the neighborhoods and pass the features. What it does, this LinkNeighborLoader, it performs our random sampling strategy. We set as a hyperparameter how many neighbors we want to sample from every hop. Usually, it’s like decreasing numbers, so you would see maybe 10 from the first neighborhood, 5 from the second, and 3 from the third. What it does, like I said, it samples links, it samples adjacent nodes and adjacent neighborhoods.
Then basically, this is your batch. You would have, for instance, 256 links with adjacent nodes and adjacent neighborhoods. In this way, it allows to preserve the structure of this edge and how it’s embedded in the full graph. Also, like in this way, you can dissect the subgraphs and train on them. Then, basically what I described before, you start your message passing process. You propagate information through these sample neighborhoods, and eventually you get embeddings for your client and entity node. Every subgraph represents the link that you need to predict. Basically, you process this small subgraph, you predict this link, you get the label, compute the loss, and backpropagate, and so on. Like I said, to compute the click label itself, you need some simple classifier. You can go more complex, but usually simple classifier in this case is quite enough.
Training Process Pitfalls
Training process pitfalls. This all sounds very nice, but not so easy to implement. First thing that I mentioned before is data leakage. Graphs are very tricky. It’s very nice that they’re all connected and data is all interdependent as opposed to rows, when rows are all independent. You have to be very careful, as I mentioned before, to separate your training data from your test data. Best way to approach it, just prepare them separately. If you don’t store your data already as a knowledge graph database, if you prepare a graph on the fly, the best way is to prepare, train, and test separately. Another important thing, when we started working on this, we didn’t really think about it, but you do this message passing via these links, but then you’re also predicting labels for these links. In a certain way, this means also information leakage to the model. Say we need to predict that this link has a label 1, and then we multiple times pass information through this link, so in a way the model may be aware what kind of label it would have. It’s not so easy to understand, but when you start experimenting, you actually see the impact.
In research, they came up with this idea that you need to have a holdout set of the links that you only use for supervision. You only use them as training labels, but you don’t use them for message passing. It actually improved model generalization. It’s called disjoint train ratio, but it’s tricky, because if you set it too low, then, as I said, you can have information leakage, because basically then the model sees all the links during training. If you set it too high, then basically your model will not learn, because you leave out too many links and have too few links during training. It either doesn’t train at all, or it doesn’t generalize. You need to find this sweet spot in the middle.
Another thing is you need fallbacks for new clients and new content. Whenever you have a client that never visited your platform, it means that they’re not connected to your graph and you need to have some kind of fallback for this client. Usually, this problem is solved by showing them the most popular content and then this starts driving engagement. Then basically they start connecting to the bigger graph. This is something to keep in mind and think about beforehand. Then the sampling strategies, as I said before, we only used random sampling. We did not go in our experimentation beyond that. This is actually very important because if you do something smarter, like if you consider the link’s recency or link’s importance, you can actually improve your model performance. Because if you sample randomly, you can sample some node which is not really relevant for this user. For instance, they only viewed it once, but they viewed other pieces more frequently or more recently. This is also something to keep in mind.
Offline Evaluation Results
Some numbers. Maybe you can see the one bold number, which is 0.7788. This is just to show which main hyperparameters you can tune in a graph neural network. As I said, this disjoint train ratio, in our case, 0.3 was the optimal number. Basically, 30% of links you do not use during training, but you only use them for labels prediction. Then, of course, number of layers and number of nodes that you sample in these layers or neighborhoods. You need to be careful because if you sample too much or have too big neighborhoods, you can encounter a problem that is called embeddings over smoothing. Basically, it means that the neighborhoods of your nodes, if they’re too big, they will intersect so much that the produced embeddings will be very similar.
Then basically these embeddings become less useful because they don’t represent the real contextual representation. It looked so well in offline. We thought, yes, let’s roll this out. It turned out it’s also not so easy. The challenges that we encountered. First of all, when you’re talking about a recommendation system, especially on an online shopping platform or Instagram, this is even worse. You want to adjust to shifting user preferences and newly appearing content as fast as possible. If you notice like on Instagram, or on TikTok, it’s really hyper-personalized. It’s like inter-session recommendations. The algorithm captures your preferences so quickly, it adjusts on the fly. To achieve this, you need to retrain your model as frequently as possible. In our case, the previous system, we were retraining every 30 minutes. We also knew experimentally that if we don’t do that, the model becomes stale very quickly. If we skip two or three hourly retrainings, then the model really, the performance degrades. It’s very important.
What happens if we store our data as a graph? If, like before, we retrained every 30 minutes, and what we did, we just did this incremental training. We have a model that is fully trained, and then we only retrain it, so basically do transfer learning on these additional rows of data that appeared during the last 30 minutes. If you’re talking about a graph, you cannot just take this graph of new data separately. You need to understand how it’s connected to the bigger graph, because otherwise it doesn’t make sense.
Otherwise, you just take the small graph and do not take into consideration the other nodes that would potentially pass their features to these new nodes or the nodes that already existed in the graph. Maybe this is just new links in the graph. It actually creates quite some operational overhead and even new infrastructural thinking, because you need to basically re-approach the way you create data and re-approach the way you retrain your model frequently.
Another problem that we encountered is running inference on a graph. Imagine you come, you open the landing page and you have to wait while it’s landing, even extra milliseconds are frustrating. You want to see it right away. When you open the app and it’s stuck, it’s like, what is going on? Is it the network? What is it? Of course, it’s very bad user experience. If you run inference on a graph, it’s of course also more complicated than running inference with classic deep neural network. Because then you also need information not just about this particular user and this particular entity, but you need information about users and entities that are connected to them. You need to aggregate more data and you need to quickly connect to these neighborhoods, which is not the case for standard tabular data. You can just take user and entity, pair their features, deep, cross, and you get probability of the click. Yes, run an inference on a graph. Unfortunately, it was not feasible for us because of highly increased latency.
Another thing, it’s maybe less important, but still, if you want to improve your model, you can of course think about a deeper network, more layers, but then the complexity increases exponentially. Also, training time is very important. Our training time cannot be over a certain limit because otherwise our model becomes stale. Training time was also posing a problem for us. What do we do? The solution we came up with was hybrid. We train these embeddings for users and entities, but we do this on a daily or sometimes even weekly basis. We store them in a feature store and we consume them with our existing downstream model, which has deep and cross architecture. It actually turned out that it already brings value. We do not need to run GNN end to end. We can only train embeddings with a GNN, which are contextual embeddings, which are much richer than these image embeddings that we used before as features for content. It already gives a boost in performance. This was the final architecture. This dotted red box is the trained GNN features, which we now feed to the downstream model, which produces quick probability.
Once again, this red box is done in daily offline manner. We train the model, run the inference on it, store the features, and then we consume them during inference by the downstream model, which is not GNN. These are, again, some numbers. ROC-AUC, this metric basically says how well you’re doing your job in terms of separating clickable from non-clickable content. This is, in essence, what GNN means, and this is what we’re interested in, basically. We want to show the content that would yield clicks. On top is the experiment deep and cross neural network without GNN features, and then different GNN features, which were inferred from GNNs of different depth with different number of sampled nodes.
Conclusion and Future Steps
I just wanted to maybe mention some future steps, which I probably will not take because I’m not working on recommender systems anymore, but I really enjoyed it. Basically, the steps you could do, like I said, you could implement smarter sampling strategies. You can enrich your node features because what we used is just last bought items, but actually, Zalando has so much more data, and the graph can be enriched, and probably these enriched embeddings could also improve the model. Also, right now, Zalando is moving more toward the idea of being more about entertainment and inspiration, and within the graph neural network, there are also levers that can allow you to control novelty and diversity of the content that you recommend. Basically, if you want to take the user a bit out of their common tunnel, and show them something that they never bought, but they probably might be interested in, so broaden their horizons a little bit, this you can also control within the GNN. These are potential future steps.
Questions and Answers
Participant 1: I think you already mentioned how you dealt with new users. How did you deal with new products? Because I assume the initial neighbor graph for products is what was in the same purchase, and if you don’t have a purchase history, how would you ever recommend something new except for random sampling?
Mariia Bulycheva: Our system, actually, it’s very complex, and it takes care of that. We have this special pool of content, which is new, and which every piece, it goes through Thomson sampling until it reaches 500 views. We have a separate pipeline, which does this sampling from the pool of content which has not reached at least 500 views, but we want to ensure that this content also has an opportunity to be shown to the user. Once they’re shown, they have the opportunity to engage with the user. If the user view or click, that’s a good signal. They gather these views. Then if they reach 500 views, they’re basically already connected to the graph, and then they become part of this pool on which were on the recommendation system.
Participant 2: I was curious how you dealt with overfitting. For example, big shoppers that spend the entire day, I suppose they could lead to overfitting, because it’s not them that you need to learn about and be hooked, but you would continuously retrain on their activity.
Mariia Bulycheva: That’s actually a very important point, and we talk a lot about it at Zalando, because when you’re thinking about improving your recommender model, you actually need to understand who is your target audience, because you cannot improve your algorithm for everybody. Like you said, if you just train on all the data, you improve it for the highly engaged users, because they are the ones who are driving it, but you want to make sure that the ones who are less engaged also are engaging.
Our policy was, focus on the ones who are in the middle, because they are the ones for whom it’s easier to move the needle, because the ones who are very low engagers, they probably don’t care. They know what they want, they come to the platform once per year, on Black Friday, they shop, they leave. How we solve the problem? We do some data filtering based on user activity to make it more balanced. We have some algorithm to filter for very special outliers, but we also want the model to be good for those frequent users. We want them to have a good experience. If they’re the ones in the driving seat, it’s fine also. I cannot say we do something very specific for that.
Participant 2: Have you considered having different models for these very engaged users?
Mariia Bulycheva: Different models? I would say no, because then if you take out these highly engaged users, you would be left with very little data. Because users that come to the platform two or three times, there’s just very little data to train on. Maybe these ones who are very active, they have similarities with less active users for whom it might be beneficial to extract information from more active ones.
Participant 3: How are you dealing with the question of homogeneity with respect to heterogeneity in the 40 examples that you were selecting? Because all your presentation was focused on selecting a specific one, but then this could potentially lead to just showing 40 varieties of the same product versus showing sufficient heterogeneity there.
Mariia Bulycheva: Of course, in the industry, it’s always a bit more complex. We have such a thing that is called business rules that are applied on top of our model, which we as machine learning engineers hate, because then it’s very hard to run pure A/B tests. This is how it’s solved. We basically have these rules that we cannot have too much sponsored content, one after another, which is basically ads. We cannot have multiple carousels one after another. This is taken care of algorithmically. It’s not solved by machine learning, but basically then the content is scored, but then these business rules apply, yes.
Participant 4: Can you please say a word or two about the process that you use to select the values for your hyperparameters?
Mariia Bulycheva: This one?
Participant 4: Yes, for example, how did you select those numbers?
Mariia Bulycheva: Experimenting. We ran experiments, we evaluated. Of course, we initially referred to research papers. The backbone of our architecture is GraphSAGE, which was, I think, developed by computer scientists at Stanford, and then it was implemented by Pinterest. I think they still have it running in production. Yes, usually it’s research papers and looking at how others did this, and then trying to fine-tune and see what’s working best for us. Then, of course, you need to be mindful of such things as training time. You have certain industrial limits that don’t allow you to experiment with too many hyperparameters.
Participant 5: Do these hyperparameters also include, for example, the preferences of people who are shopping a lot, or even a specific time when you know that the shopping will increase, for example, specific seasons?
Mariia Bulycheva: We have timestamps of the session, so when the session occurred, we have that as a feature. We don’t account specifically for highly engaged users and low engaged, but as features, like I said, for instance, users who shop rarely would just have a lot of zeros in their node features. Those who shop frequently would have this full matrix of the feature. In our training, to make it shorter and summarized, we do not differentiate between user types, but that’s just our general approach.
Participant 5: Do you take season as a factor? For example, in December, in the festive season, the shopping would be more?
Mariia Bulycheva: No, we don’t take into account seasonality in terms of months or year seasons. For us, the seven-day cycle is what’s important because shopping patterns are very different from Monday to Saturday or from Wednesday to Sunday. This, we want to capture within this week timeframe, but then seasonal patterns, very quickly, it gets outdated. Something that happened two months ago does not have much to do with user activity now. In terms of engagement on the landing page, it doesn’t really have that much difference, whether it’s December, or summer. It does matter for pricing. I used to work in the pricing department before, and there, these experimental cycles and A/B testing phases were much longer because we wanted to capture how prices for different products change, how demand for different products change from one season to another. The join and clicking behavior, not so much. We don’t see so much dependency within the season. It’s more within the week.
See more presentations with transcripts
