Ich habe den folgenden Datenrahmen. Die Spalte Genres enthält eine Liste mehrerer Wörterbücher.

index. title    genres
0      Avatar                                       [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]
1      Pirates of the Caribbean: At World's End     [{"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 28, "name": "Action"}]
2      Spectre                                      [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 80, "name": "Crime"}]
3      The Dark Knight Rises                        [{"id": 28, "name": "Action"}, {"id": 80, "name": "Crime"}, {"id": 18, "name": "Drama"}, {"id": 53, "name": "Thriller"}]
4      John Carter                                  [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 878, "name": "Science Fiction"}]

Ich hätte gerne einen Datenrahmen wie folgt:

     Title   Name
     Avatar  Action
     Avatar  Adventure
     Avatar  Fantasy
     Avatar  Science Fiction
     Pirates.. Adventure
     Pirates.. Fantasy
     ...

Ich hoffe die Fragen sind klar. Ich stelle zum ersten Mal eine Frage. Vielen Dank,

0
Koray Can Canut 8 Feb. 2020 im 16:49

4 Antworten

Beste Antwort

Angenommen, wir haben einen df:

df
    title   genres
0   Avatar  [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]
1   Pirates of the Caribbean: At World's End    [{"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 28, "name": "Action"}]
2   Spectre [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 80, "name": "Crime"}]
3   The Dark Knight Rises   [{"id": 28, "name": "Action"}, {"id": 80, "name": "Crime"}, {"id": 18, "name": "Drama"}, {"id": 53, "name": "Thriller"}]
4   John Carter [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 878, "name": "Science Fiction"}]

Dann können wir tun:

df["genres"] = df["genres"].apply(lambda row: [genre["name"] for genre in row])
df.explode("genres")
    title   genres
0   Avatar  Action
0   Avatar  Adventure
0   Avatar  Fantasy
0   Avatar  Science Fiction
1   Pirates of the Caribbean: At World's End    Adventure
1   Pirates of the Caribbean: At World's End    Fantasy
1   Pirates of the Caribbean: At World's End    Action
2   Spectre Action
2   Spectre Adventure
2   Spectre Crime
3   The Dark Knight Rises   Action
3   The Dark Knight Rises   Crime
3   The Dark Knight Rises   Drama
3   The Dark Knight Rises   Thriller
4   John Carter Action
4   John Carter Adventure
4   John Carter Science Fiction
2
Sergey Bushmanov 8 Feb. 2020 im 14:58
import pandas as pd
import ast

df = "dataframe"
df_list = []

Durchlaufen Sie jede Zeile und erhalten Sie Werte für Titel- und Genrespalten

for index, row in df.iterrows():
    title = row['title']
    gn = row['genres']
    genres = ast.literal_eval(gn)

    for i in range(0, len(genres)):
        r_list = []
        r_list.append(title)
        r_list.append(genres[i]['name'])
        df_list.append(r_list)

out_df = pd.DataFrame(df_list,columns=['Title','Name'])
print(out_df.head)

Wenn die Werte der Spaltengenres vom Typ string sind, müssen wir sie in eine Liste konvertieren. Dazu verwenden wir "ast.literal_eval ()".

0
Amal Thachappilly 8 Feb. 2020 im 15:24

Ich würde das tun:

import pandas as pd

df = pd.DataFrame({"title":["Avatar","Spectre"],"genres":[
                    [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}],
                    [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 80, "name": "Crime"}]
                    ]})

print(df)

     title                                             genres
0   Avatar  [{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...
1  Spectre  [{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...

Holen Sie sich nur Namen aus der Spalte "Genres"

df["genres"] = df["genres"].apply(lambda x:[y.get("name") for y in x])

Erstellen Sie einen neuen Datenrahmen mit nur Namen:

df1 = pd.DataFrame(df["genres"].values.tolist())
df1.columns = ["name_{}".format(x) for x in range(len(df1.columns))]

Kombinieren Sie beide:

df = pd.concat([df[["title"]],df1],axis=1)

Schmelze:

df.melt(id_vars="title",value_vars=df.columns[1:],value_name="name")[["title","name"]].dropna().set_index("title").sort_index()



                 name
title
Avatar            Action
Avatar         Adventure
Avatar           Fantasy
Avatar   Science Fiction
Spectre           Action
Spectre        Adventure
Spectre            Crime
0
kleerofski 8 Feb. 2020 im 14:34
title = ["Avatar", "Pirates of the Caribbean: At World's End", "Spectre", "The Dark Knight Rises", "John Carter" ]
genres = [[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}],
          [{"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 28, "name": "Action"}],
          [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 80, "name": "Crime"}],
          [{"id": 28, "name": "Action"}, {"id": 80, "name": "Crime"}, {"id": 18, "name": "Drama"}, {"id": 53, "name": "Thriller"}],
          [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 878, "name": "Science Fiction"}]]
df = pd.DataFrame({"title": title,
                   "genres": genres})

Explodieren der Wörterbuchserie:

genres_list = df["genres"].apply(lambda x: [y["name"] for y in x ]).explode()
genres_list

0             Action
0          Adventure
0            Fantasy
0    Science Fiction
1          Adventure
1            Fantasy
1             Action
2             Action
2          Adventure
2              Crime
3             Action
3              Crime
3              Drama
3           Thriller
4             Action
4          Adventure
4    Science Fiction
Name: genres, dtype: object

Erweiterung der Titel:

Jedes Element in df["title"] wird n_i Mal wiederholt, wobei n_i die Länge des jeweiligen Wörterbuchs ist. Siehe die Dokumentation.

title_rep = df["title"].repeat(df["genres"].apply(lambda x: len(x)))
title_rep

0                                      Avatar
0                                      Avatar
0                                      Avatar
0                                      Avatar
1    Pirates of the Caribbean: At World's End
1    Pirates of the Caribbean: At World's End
1    Pirates of the Caribbean: At World's End
2                                     Spectre
2                                     Spectre
2                                     Spectre
3                       The Dark Knight Rises
3                       The Dark Knight Rises
3                       The Dark Knight Rises
3                       The Dark Knight Rises
4                                 John Carter
4                                 John Carter
4                                 John Carter
Name: title, dtype: object

Kombinieren:

pd.DataFrame({"title": title_rep,
              "genres": genres_list})

Kehrt zurück:

            title   genres
0   Avatar  Action
0   Avatar  Adventure
0   Avatar  Fantasy
0   Avatar  Science Fiction
1   Pirates of the Caribbean: At World's End    Adventure
1   Pirates of the Caribbean: At World's End    Fantasy
1   Pirates of the Caribbean: At World's End    Action
2   Spectre Action
2   Spectre Adventure
2   Spectre Crime
3   The Dark Knight Rises   Action
3   The Dark Knight Rises   Crime
3   The Dark Knight Rises   Drama
3   The Dark Knight Rises   Thriller
4   John Carter Action
4   John Carter Adventure
4   John Carter Science Fiction
1
akilat90 9 Feb. 2020 im 17:00