Pandas根据时间戳分配5-1的点数,最近获得最大点数
提问开始:
在设计推荐系统时,我偶然发现了一种情况,即协同过滤的实现需要投票或类似的东西。但在我们的系统中,我们没有任何用于评分/投票的字段。我愿意根据用户观看节目的时间戳来推断类似的评分/投票。
这就是视图历史的样子。
subscriber_id content_id timestamp
1 123 1576833135000
1 124 1576833140000
1 125 1576833145000
1 126 1576833150000
1 127 1576833155000
1 128 1576833160000
1 129 1576833165000
1 130 1576833170000
1 131 1576833175000
1 132 1576833180000
2 123 1576833135000
2 124 1576833140000
2 125 1576833145000
2 126 1576833150000
2 127 1576833155000
2 128 1576833160000
2 129 1576833165000
2 130 1576833170000
2 131 1576833175000
2 132 1576833180000
2 133 1576833185000
2 134 1576833190000
2 135 1576833195000
2 136 1576833200000
2 137 1576833205000
2 138 1576833210000
2 139 1576833215000
2 140 1576833220000
2 141 1576833225000
2 142 1576833230000
2 143 1576833235000
2 144 1576833240000
我想给每个条目分配一个数字,范围从5到1(5是最近的),我已经实现了排名系统,但它不适用于范围。
df1['rank'] = df1.sort_values(['subscriber_id','timestamp']) \
.groupby(['subscriber_id'])['timestamp'] \
.rank(method='max').astype(int)
预期输出:
subscriber_id content_id timestamp rating
1 123 1576833135000 1
1 124 1576833140000 1
1 125 1576833145000 2
1 126 1576833150000 2
1 127 1576833155000 3
1 128 1576833160000 3
1 129 1576833165000 4
1 130 1576833170000 4
1 131 1576833175000 5
1 132 1576833180000 5
2 123 1576833135000 1
2 124 1576833140000 1
2 125 1576833145000 1
2 126 1576833150000 1
2 127 1576833155000 2
2 128 1576833160000 2
2 129 1576833165000 2
2 130 1576833170000 2
2 131 1576833175000 3
2 132 1576833180000 3
2 133 1576833185000 3
2 134 1576833190000 3
2 135 1576833195000 4
2 136 1576833200000 4
2 137 1576833205000 4
2 138 1576833210000 4
2 139 1576833215000 4
2 140 1576833220000 5
2 141 1576833225000 5
2 142 1576833230000 5
2 143 1576833235000 5
2 144 1576833240000 5
任何帮助都将不胜感激!
回答开始:得票数 1现在就说得通了。解决方案是通过将选定用户的数据数量除以5来创建基于模数值的排名列表。
import pandas as pd
from io import StringIO
data = StringIO("""
content_id subscriber_id timestamp
123 1 1576833135000
124 1 1576833140000
125 1 1576833145000
126 1 1576833150000
127 1 1576833155000
128 1 1576833160000
129 1 1576833165000
130 1 1576833170000
131 1 1576833175000
132 1 1576833180000
123 2 1576833135000
124 2 1576833140000
125 2 1576833145000
126 2 1576833150000
127 2 1576833155000
128 2 1576833160000
129 2 1576833165000
130 2 1576833170000
131 2 1576833175000
132 2 1576833180000
133 2 1576833185000
134 2 1576833190000
135 2 1576833195000
136 2 1576833200000
137 2 1576833205000
138 2 1576833210000
139 2 1576833215000
140 2 1576833220000
141 2 1576833225000
142 2 1576833230000
143 2 1576833235000
144 2 1576833240000
""")
# load data into data frame
df = pd.read_csv(data, sep=' ')
# get unique users
user_list = df['subscriber_id'].unique()
# collect results
results = pd.DataFrame(columns=['content_id','subscriber_id','timestamp','rating'])
for user in user_list:
# select data range for one user
df2 = df[df['subscriber_id'] == user]
items_numer = df2.shape[0]
modulo_remider = items_numer % 5
ranks_repeat = int(items_numer / 5)
# create rating list based on modulo
if modulo_remider > 0:
rating = []
for i in range(1, 6, 1):
l = [i for j in range(ranks_repeat)]
for number in l:
rating.append(number)
if modulo_remider == 1:
rating.insert(rating.index(5), 5)
if modulo_remider == 2:
rating.insert(rating.index(4), 4)
rating.insert(rating.index(5), 5)
if modulo_remider == 3:
rating.insert(rating.index(3), 3)
rating.insert(rating.index(4), 4)
rating.insert(rating.index(5), 5)
if modulo_remider == 4:
rating.insert(rating.index(2), 2)
rating.insert(rating.index(3), 3)
rating.insert(rating.index(4), 4)
rating.insert(rating.index(5), 5)
df2.insert(3, 'rating', rating, True)
else:
rating = []
for i in range(1, 6, 1):
l = [i for j in range(ranks_repeat)]
for number in l:
rating.append(number)
df2.insert(3, 'rating', rating, True)
# collect results
results = results.append(df2)
结果:
content_id subscriber_id timestamp rating
0 123 1 1576833135000 1
1 124 1 1576833140000 1
2 125 1 1576833145000 2
3 126 1 1576833150000 2
4 127 1 1576833155000 3
5 128 1 1576833160000 3
6 129 1 1576833165000 4
7 130 1 1576833170000 4
8 131 1 1576833175000 5
9 132 1 1576833180000 5
10 123 2 1576833135000 1
11 124 2 1576833140000 1
12 125 2 1576833145000 1
13 126 2 1576833150000 1
14 127 2 1576833155000 2
15 128 2 1576833160000 2
16 129 2 1576833165000 2
17 130 2 1576833170000 2
18 131 2 1576833175000 3
19 132 2 1576833180000 3
20 133 2 1576833185000 3
21 134 2 1576833190000 3
22 135 2 1576833195000 4
23 136 2 1576833200000 4
24 137 2 1576833205000 4
25 138 2 1576833210000 4
26 139 2 1576833215000 4
27 140 2 1576833220000 5
28 141 2 1576833225000 5
29 142 2 1576833230000 5
30 143 2 1576833235000 5
31 144 2 1576833240000 5
总结 以上是真正的电脑专家为你收集整理的Pandas根据时间戳分配5-1的点数,最近获得最大点数的全部内容,希望文章能够帮你解决所遇到的问题。
如果觉得真正的电脑专家网站内容还不错,欢迎将真正的电脑专家推荐给好友。