Pandas根据时间戳分配5-1的点数,最近获得最大点数

科技网编2023-04-09 07:381950
提问开始:

在设计推荐系统时,我偶然发现了一种情况,即协同过滤的实现需要投票或类似的东西。但在我们的系统中,我们没有任何用于评分/投票的字段。我愿意根据用户观看节目的时间戳来推断类似的评分/投票。

这就是视图历史的样子。

subscriber_id  content_id      timestamp
1   123 1576833135000    
1   124 1576833140000    
1   125 1576833145000    
1   126 1576833150000    
1   127 1576833155000    
1   128 1576833160000    
1   129 1576833165000    
1   130 1576833170000    
1   131 1576833175000    
1   132 1576833180000    
2   123 1576833135000    
2   124 1576833140000    
2   125 1576833145000    
2   126 1576833150000    
2   127 1576833155000    
2   128 1576833160000    
2   129 1576833165000    
2   130 1576833170000    
2   131 1576833175000    
2   132 1576833180000    
2   133 1576833185000    
2   134 1576833190000    
2   135 1576833195000    
2   136 1576833200000    
2   137 1576833205000    
2   138 1576833210000    
2   139 1576833215000    
2   140 1576833220000    
2   141 1576833225000    
2   142 1576833230000    
2   143 1576833235000    
2   144 1576833240000  

我想给每个条目分配一个数字,范围从5到1(5是最近的),我已经实现了排名系统,但它不适用于范围。

df1['rank'] = df1.sort_values(['subscriber_id','timestamp']) \
                        .groupby(['subscriber_id'])['timestamp'] \
                        .rank(method='max').astype(int) 

预期输出:

subscriber_id  content_id      timestamp    rating
1   123 1576833135000       1
1   124 1576833140000       1
1   125 1576833145000       2
1   126 1576833150000       2
1   127 1576833155000       3
1   128 1576833160000       3
1   129 1576833165000       4
1   130 1576833170000       4
1   131 1576833175000       5
1   132 1576833180000       5
2   123 1576833135000       1
2   124 1576833140000       1
2   125 1576833145000       1
2   126 1576833150000       1
2   127 1576833155000       2
2   128 1576833160000       2
2   129 1576833165000       2
2   130 1576833170000       2
2   131 1576833175000       3
2   132 1576833180000       3
2   133 1576833185000       3
2   134 1576833190000       3
2   135 1576833195000       4
2   136 1576833200000       4
2   137 1576833205000       4
2   138 1576833210000       4
2   139 1576833215000       4
2   140 1576833220000       5
2   141 1576833225000       5
2   142 1576833230000       5
2   143 1576833235000       5
2   144 1576833240000       5

任何帮助都将不胜感激!

回答开始:得票数 1

现在就说得通了。解决方案是通过将选定用户的数据数量除以5来创建基于模数值的排名列表。

import pandas as pd
from io import StringIO

data = StringIO("""
content_id subscriber_id timestamp
123 1 1576833135000
124 1 1576833140000
125 1 1576833145000
126 1 1576833150000
127 1 1576833155000
128 1 1576833160000
129 1 1576833165000
130 1 1576833170000
131 1 1576833175000
132 1 1576833180000
123 2 1576833135000
124 2 1576833140000
125 2 1576833145000
126 2 1576833150000
127 2 1576833155000
128 2 1576833160000
129 2 1576833165000
130 2 1576833170000
131 2 1576833175000
132 2 1576833180000
133 2 1576833185000
134 2 1576833190000
135 2 1576833195000
136 2 1576833200000
137 2 1576833205000
138 2 1576833210000
139 2 1576833215000
140 2 1576833220000
141 2 1576833225000
142 2 1576833230000
143 2 1576833235000
144 2 1576833240000
""")

# load data into data frame
df = pd.read_csv(data, sep=' ')
# get unique users
user_list = df['subscriber_id'].unique()

# collect results
results = pd.DataFrame(columns=['content_id','subscriber_id','timestamp','rating'])
for user in user_list:
    # select data range for one user
    df2 = df[df['subscriber_id'] == user]
    items_numer = df2.shape[0]
    modulo_remider = items_numer % 5
    ranks_repeat = int(items_numer / 5)

    # create rating list based on modulo
    if modulo_remider > 0:
        rating = []
        for i in range(1, 6, 1):
            l = [i for j in range(ranks_repeat)]
            for number in l:
                rating.append(number)

        if modulo_remider == 1:
            rating.insert(rating.index(5), 5)

        if modulo_remider == 2:
            rating.insert(rating.index(4), 4)
            rating.insert(rating.index(5), 5)

        if modulo_remider == 3:
            rating.insert(rating.index(3), 3)
            rating.insert(rating.index(4), 4)
            rating.insert(rating.index(5), 5)

        if modulo_remider == 4:
            rating.insert(rating.index(2), 2)
            rating.insert(rating.index(3), 3)
            rating.insert(rating.index(4), 4)
            rating.insert(rating.index(5), 5)

        df2.insert(3, 'rating', rating, True)
    else:
        rating = []
        for i in range(1, 6, 1):
            l = [i for j in range(ranks_repeat)]
            for number in l:
                rating.append(number)

        df2.insert(3, 'rating', rating, True)

    # collect results
    results = results.append(df2)

结果:

   content_id subscriber_id      timestamp rating
0         123             1  1576833135000      1
1         124             1  1576833140000      1
2         125             1  1576833145000      2
3         126             1  1576833150000      2
4         127             1  1576833155000      3
5         128             1  1576833160000      3
6         129             1  1576833165000      4
7         130             1  1576833170000      4
8         131             1  1576833175000      5
9         132             1  1576833180000      5
10        123             2  1576833135000      1
11        124             2  1576833140000      1
12        125             2  1576833145000      1
13        126             2  1576833150000      1
14        127             2  1576833155000      2
15        128             2  1576833160000      2
16        129             2  1576833165000      2
17        130             2  1576833170000      2
18        131             2  1576833175000      3
19        132             2  1576833180000      3
20        133             2  1576833185000      3
21        134             2  1576833190000      3
22        135             2  1576833195000      4
23        136             2  1576833200000      4
24        137             2  1576833205000      4
25        138             2  1576833210000      4
26        139             2  1576833215000      4
27        140             2  1576833220000      5
28        141             2  1576833225000      5
29        142             2  1576833230000      5
30        143             2  1576833235000      5
31        144             2  1576833240000      5
总结

以上是真正的电脑专家为你收集整理的Pandas根据时间戳分配5-1的点数,最近获得最大点数的全部内容,希望文章能够帮你解决所遇到的问题。

如果觉得真正的电脑专家网站内容还不错,欢迎将真正的电脑专家推荐给好友。

评论区