在设计推荐系统时,我偶然发现了一种情况,即协同过滤的实现需要投票或类似的东西。但在我们的系统中,我们没有任何用于评分/投票的字段。我愿意根据用户观看节目的时间戳来推断类似的评分/投票。
这就是视图历史的样子。
subscriber_id content_id timestamp 1 123 1576833135000 1 124 1576833140000 1 125 1576833145000 1 126 1576833150000 1 127 1576833155000 1 128 1576833160000 1 129 1576833165000 1 130 1576833170000 1 131 1576833175000 1 132 1576833180000 2 123 1576833135000 2 124 1576833140000 2 125 1576833145000 2 126 1576833150000 2 127 1576833155000 2 128 1576833160000 2 129 1576833165000 2 130 1576833170000 2 131 1576833175000 2 132 1576833180000 2 133 1576833185000 2 134 1576833190000 2 135 1576833195000 2 136 1576833200000 2 137 1576833205000 2 138 1576833210000 2 139 1576833215000 2 140 1576833220000 2 141 1576833225000 2 142 1576833230000 2 143 1576833235000 2 144 1576833240000
我想给每个条目分配一个数字,范围从5到1(5是最近的),我已经实现了排名系统,但它不适用于范围。
df1['rank'] = df1.sort_values(['subscriber_id','timestamp']) \ .groupby(['subscriber_id'])['timestamp'] \ .rank(method='max').astype(int)
预期输出:
subscriber_id content_id timestamp rating 1 123 1576833135000 1 1 124 1576833140000 1 1 125 1576833145000 2 1 126 1576833150000 2 1 127 1576833155000 3 1 128 1576833160000 3 1 129 1576833165000 4 1 130 1576833170000 4 1 131 1576833175000 5 1 132 1576833180000 5 2 123 1576833135000 1 2 124 1576833140000 1 2 125 1576833145000 1 2 126 1576833150000 1 2 127 1576833155000 2 2 128 1576833160000 2 2 129 1576833165000 2 2 130 1576833170000 2 2 131 1576833175000 3 2 132 1576833180000 3 2 133 1576833185000 3 2 134 1576833190000 3 2 135 1576833195000 4 2 136 1576833200000 4 2 137 1576833205000 4 2 138 1576833210000 4 2 139 1576833215000 4 2 140 1576833220000 5 2 141 1576833225000 5 2 142 1576833230000 5 2 143 1576833235000 5 2 144 1576833240000 5
任何帮助都将不胜感激!
回答开始:得票数 1现在就说得通了。解决方案是通过将选定用户的数据数量除以5来创建基于模数值的排名列表。
import pandas as pd from io import StringIO data = StringIO(""" content_id subscriber_id timestamp 123 1 1576833135000 124 1 1576833140000 125 1 1576833145000 126 1 1576833150000 127 1 1576833155000 128 1 1576833160000 129 1 1576833165000 130 1 1576833170000 131 1 1576833175000 132 1 1576833180000 123 2 1576833135000 124 2 1576833140000 125 2 1576833145000 126 2 1576833150000 127 2 1576833155000 128 2 1576833160000 129 2 1576833165000 130 2 1576833170000 131 2 1576833175000 132 2 1576833180000 133 2 1576833185000 134 2 1576833190000 135 2 1576833195000 136 2 1576833200000 137 2 1576833205000 138 2 1576833210000 139 2 1576833215000 140 2 1576833220000 141 2 1576833225000 142 2 1576833230000 143 2 1576833235000 144 2 1576833240000 """) # load data into data frame df = pd.read_csv(data, sep=' ') # get unique users user_list = df['subscriber_id'].unique() # collect results results = pd.DataFrame(columns=['content_id','subscriber_id','timestamp','rating']) for user in user_list: # select data range for one user df2 = df[df['subscriber_id'] == user] items_numer = df2.shape[0] modulo_remider = items_numer % 5 ranks_repeat = int(items_numer / 5) # create rating list based on modulo if modulo_remider > 0: rating = [] for i in range(1, 6, 1): l = [i for j in range(ranks_repeat)] for number in l: rating.append(number) if modulo_remider == 1: rating.insert(rating.index(5), 5) if modulo_remider == 2: rating.insert(rating.index(4), 4) rating.insert(rating.index(5), 5) if modulo_remider == 3: rating.insert(rating.index(3), 3) rating.insert(rating.index(4), 4) rating.insert(rating.index(5), 5) if modulo_remider == 4: rating.insert(rating.index(2), 2) rating.insert(rating.index(3), 3) rating.insert(rating.index(4), 4) rating.insert(rating.index(5), 5) df2.insert(3, 'rating', rating, True) else: rating = [] for i in range(1, 6, 1): l = [i for j in range(ranks_repeat)] for number in l: rating.append(number) df2.insert(3, 'rating', rating, True) # collect results results = results.append(df2)
结果:
content_id subscriber_id timestamp rating 0 123 1 1576833135000 1 1 124 1 1576833140000 1 2 125 1 1576833145000 2 3 126 1 1576833150000 2 4 127 1 1576833155000 3 5 128 1 1576833160000 3 6 129 1 1576833165000 4 7 130 1 1576833170000 4 8 131 1 1576833175000 5 9 132 1 1576833180000 5 10 123 2 1576833135000 1 11 124 2 1576833140000 1 12 125 2 1576833145000 1 13 126 2 1576833150000 1 14 127 2 1576833155000 2 15 128 2 1576833160000 2 16 129 2 1576833165000 2 17 130 2 1576833170000 2 18 131 2 1576833175000 3 19 132 2 1576833180000 3 20 133 2 1576833185000 3 21 134 2 1576833190000 3 22 135 2 1576833195000 4 23 136 2 1576833200000 4 24 137 2 1576833205000 4 25 138 2 1576833210000 4 26 139 2 1576833215000 4 27 140 2 1576833220000 5 28 141 2 1576833225000 5 29 142 2 1576833230000 5 30 143 2 1576833235000 5 31 144 2 1576833240000 5总结
以上是真正的电脑专家为你收集整理的Pandas根据时间戳分配5-1的点数,最近获得最大点数的全部内容,希望文章能够帮你解决所遇到的问题。
如果觉得真正的电脑专家网站内容还不错,欢迎将真正的电脑专家推荐给好友。
标签: #系统 #id #subscriber
评论列表