"How can we generate useful recommendations? Your web application has a stable number of users, you've finished the core product and want to add on experimental features to improve customer retention.
GOOLYBIB OFFICE - MONDAY MORNING
Your boss sends you a message: this is probably about adding extra functionality to the react application.
Hey, have you looked at building some simple user recommendations yet?\nIt would be great if our app could recommend relevant products that have been seen together.\nJust like how Amazon has that 'People who viewed this product have also viewed these products' section?
Remember to make a copy of the google colab file so that you can follow along!
Recommendation Engine All right. So we've been tasked with putting some Amazon style recommendations. So you'll see these on Amazon, for example, like products related to this item, we'll be taking an approach where we'll be partitioning the data by a user ID. And this is quite useful for if you want to generate carousels. So for example, if there's a people who viewed this course or people who viewed this product also viewed these courses or these products Before we get into any of the code.\nI think it's really important that we describe this process of step by step. And I've done some pseudo code for us to go through this. So we'll be first importing our modules and some fake dummy data that you'll have access to. Additionally, you'll be able to extract. All of the courses or the products or whatever your item is from the database, we'll then need to extract how users are viewing those items or courses from the database, wrangling the dates to be in a specific format inside of a Panda's data frame.\nOnce we then have users against courses or users against products will then generate all of those combinations. Were those Horlicks have been seen in combination in part in combination to the user ID we'll also then create the recommendations and then we'll also provide a graceful fallback option just in case a specific course or a specific product is new.\nAnd it hasn't been in combination with other courses or products. And then just a little bit on the fact that we'll also need to be able to store these recommendations inside of the day. So running these two bits of code where we're download some dummy data, because I'm conscious that some of you might not have a production database, so you can connect to we've taken the Liberty of producing some dummy data for you.\nAnd so what you're going to need to do is in your script, you're going to need, when you've connected and extracted your data from your database, you'll need to have it in a format where you have a PIF and list of your product IDs or your course ID. And you'll also need the format like this. So you can see you've got the CR the product ID and some extra properties inside of each individual pro product as, as a, as a pipe in dictionary.\nSo this data doc, Jason, if we have a look at this we'll load in some of that course as data that looks like this format, for example. So you can see here, you've got some. Quite a large dictionary structure now, but if we go scroll all the way to the top, you can see this as the course ID in our case and some extra information.\nAnd it's exactly this format of a course or a product ID and then some extra information nested in that. So, so that's that and then once you've extracted the, the product IDs and you know, as a list and you've also got the product IDs in a, in a, in a dictionary for. What you'll then need to do is extract the users in, in relation to how they're accessing your courses or your products.\n\nSo if you, for example, this will also be a section you would need. Right. And it's completely bespoke. But if you were using Firebase, then maybe you would be storing how users going through your product and interacting with either courses or products or items. And you can see this as an example of like, for example, in our scenario, we have a real-time database, which just connects to the users and we essentially can get all that data in in using hyphen.\nThere's some sample data provided for you. And essentially the. That we're going to need to wrangle the data in is going to be like this, where we have this core state of debt, where we have item IDs and a list of items, and we have user IDs and whether they viewed certain items. So if we're having a look at this in a very simplistic way, we have the item ID, which could be your course idea.\nIt could be your product ID. It could be, you know, any sort of item. And then we have the user. And what's important about this, as you can see, like certain users have viewed multiple license and you can do more complicated recommendations if you add in a rating, a rating column as well. For this example, we wanted to build a recommendation engine that just purely uses the co-occurrence or the combinations of courses based upon a user ID.\nSo we have this form. And you need to get the data into this format. Right. So imagine you've import, you've imported your course ideas as a list up here. And you've also got your course ideas as a dictionary. Yeah. And then what we want to then do as well is we've got our user data in how, wherever they are you know, viewing your courses, et cetera, on a user level of beta on a user basis.\nSo if we have a look at this one for. You'll see that, you know, our user user ID is the first key. So if we scroll right up to the top on this one, you'll see. So for example, you've got user ID as the top. If I just grow, I'm just going to scroll up quickly and get to the top of this is quite long as you can imagine.\nCause there's quite a lot of user data that we have here, but essentially what you're looking to do is say for each and every individual user, which courses did they view? So that's how you need to wrangle the debate, the data, right? Whether it's you or a developer, you need to basically say we have some users here like this user, for example, and these are the courses that they've they've specifically viewed.\nAnd so that's kind of how you should be thinking about this is we're godless of the shape of your data. You need. The, the actual like courses and pro or products you need some product product dictionary. If you need to insert any more information and then going onto this step, we also need to have how the users are relating to that.\nAnd basically going in through the data, whatever format that's in, and finally creating a structure like this upon this data stream starts like this. So in our case, what we needed to do is we have a dictionary of users. So we're looping over all of these users. And then we're then looking to see does that specific user have a course dictionary inside that course key.\nAnd if they do what we're then trying to do is create a list of courses from that. And then we're checking to see whether those lists of courses exists in our current set of product IDs. And if we do have any courses that are in our product ID, Or our course IDs, then we're basically saying for every course that we found there, add that as a pair, right?\nSo we're essentially going loop over all the users for each user find all the courses or in your case, it might be all the products that I use are as viewed. And if there is products that are user is viewed. Then whenever you've got five products loop over there, if there are more than one product loop over those five products as a pair of item ID and user IDs.\nAnd then we ended up with this data structure that looks a bit, a little bit like this, right, where we finally got this data structure of items. So if I ran this code. You'll see this changes. So now we have slightly larger data structure where we've got all of the individual item IDs against user IDs.\nAnd this is what it looks like inside of a data frame. So you can see there's 214 rows by two columns. We have every item ID against the specific user ID. So we now understand. Which courses users as a whole have viewed in terms of, or which products users have viewed. And this is quite useful because then we can apply a combinations function to this.\nAnd we can say this section here is essentially. Group the data by the user IDs. We're essentially partitioning the data by the user ID. And what's quite nice about this is we can then figure out aggregated metrics. So for every user count how many times they have an item idea. For example, when you can see this user has two courses that they viewed but we could take this one step further and do a dot apply function.\nAnd what we've done here is say group the data by the user ID. And then after you've grouped the dates by the user ID, we want to take the item ID and apply a function to it. And this function you'll see when we run that actually produces the data back in a user ID with the combinations of those functions.\nSo what we're looking at here is whichever courses have been viewed more often. In combinations at a user level. So these two courses here, these two course IDs, these items. Viewed a lot more across all of the users than other than other courses. And therefore we probably would want to include those together in a recommendations algorithm for if you're looking at this course, you would also like to be able to look at these courses.\nSo that's kind of what this group by with the dot apply combined or value counts is doing underneath the hood and to, to put it in a bit more of a simplistic matter and look. You basically end up with two paired combinations of courses and how often those courses are found together. So we can then write something a little bit to say, well, now we've got all these combinations.\nWe need to look at all of the courses that we have or all the products that we have and see, okay, if we have these products I want to then set up a new dictionary. And for, in our case, it's people also viewed this course, but it could be people also viewed this product for yourself, looping over those product IDs.\nAnd then what we're doing is we're looping over all those product IDs against the data itself. And with. Getting out the individual combination. Of the, so these individual combinations, so we're getting out this first one and that second one. And we're then saying, if we, if we don't have at least three combinations inside of that individual pro inside that individual product, or course then keep going and do this else.\nWe just want to break out of this four loop, right. And go onto the next product. So if the course has already got three recommends then we're basically going to be breaking out of that loop elsewhere. Look if the combination is equal to that, then we also look and append the other combination and do the reverse.\nIf we'd have the combination too, and else we don't match, and there might be some products that you don't find a match for, and that's all we're going to handle later. And you can see here that when we do this we then end up with a slightly. Dictionary. So if we look at this, people also viewed this course, we now have every product and the recommended products, the recommended three products to go with that.\nAnd so you can imagine on a product page, you could do this. So you could tune this barometer to be three or five, and then you could have a carousel of the. Main products that are, are, that are often seen with that product after we've partitioned data by user. So that's all great. And then going on from here, we're going well.\nOkay. Well, we've got a hundred products. Maybe we've got 20 new products that just came out yesterday and all of our users haven't seen these new products with other products. So what we could do is take the original data from. Where we had, if you remember going back to this this data frame of the item ID against the user ID.\nAnd what we would then like to do is do a value counts on this. And you can see if by looking in doing a value counts on this, what we can get is we can get an understanding as to what products have been viewed. The most in terms of our users. And you could also use Google analytics for data using Bitcoin for this.\nSo that's also something you might look to implement, but essentially getting some sort of understanding as to what courses have been viewed the most often. And then we could use these for courses that fell in this else block here. So when we didn't find a match what we would like to do is then loop back over that dictionary.\nSo we'll extract the popular courses columns and do so then we have this data frame here that looks a little bit like. And then you can see you. So this ends up looking like we bought the course and we've gotten the number of times it was viewed. Cool. So we can then loop over that for every course or product that you have.\nIf the length of that in the dictionary is not equal to three, then we know that we haven't found any courses and therefore we could look and go, okay, we'll get the popular course ID, get the data loop over that, this new data frame that we've made. And then say, if the course isn't equal to itself, then we're going to append that.\nAnd, and then what we then have here is pause for if there are greater than three results or they're equal to three results, then we break out, which basically means you know, we're just going to be taking the top three most popular courses every time. And appending that into into that individual dictionary.\nAnd so then what you end up with is a final dictionary that you could use called people who also viewed this course, and you could change the name of this, but the essentially the way that you can think about this is every single key in here is the product. And the array is the most valuable courses that have been seen with that course.\nSo the first one is seen more than the second one. And the second one is seen more than the first. And so you could then save this into your backend. So the way that we do that is we have reference called in our database called user course view recommendations, and we set up. And then on the client's side, you could then reference that.\nSo you can see where you're using Firebase database, and then we have referencing that we can get those recommendations. So this is kind of an end to end way that you could easily implement recommendations. And I think the other important thing for us to connect. The deployment of this. So I just thought I'd touch on the deployments.\nSo you'll have access to all their source code. And the only thing that you're going to need to change are the sections where the credit will be unique. So this second. This section and also the, the uploading as well back into your database, there's three sections that your developer or yourself will need to write which accustom, but you've got an example of how that's kind of currently implemented in the system.\nAnd then if we're looking at deployment method, you could, for example, have a chronic. That runs every day or every week. And then basically what they can do is it could be hitting a cloud function that runs this place and code inside of this. And that that function would, would do something like this.\nSo you could create user course recommendations. And you could also separate the fit, the fullback recommendations. If you wanted on a separate. Or you could include that all in in one single pipe in script and that's just running a cloud function. And so every day we could go in and hit this cloud function at three in the morning and generate fresh recommendations for any products that were added in from the night before.\nSo hopefully this gives you an indication into how you could start to add additional recommendations functionality into your app.
What data do you need to make 'users also viewed' recommendations?
What does 'count of occurrence' mean for section 5 \"5. Generate combinations of all courses/products found together, partitioned by user id:\"?
Why is it a good idea to have a fall back for recommendations?
"