Ich habe eine Liste von Zeichenfolgen (siehe unten). Ich möchte die Elemente in der Liste abrufen, indem ich nach zwei bestimmten Token (Anfang und Ende) suche und dann alle zwischen diesen Token vorhandenen Zeichenfolgen speichere.

Zum Beispiel habe ich die folgende Liste und möchte alle Zeichenfolgen zwischen jedem Auftreten der Zeichenfolgen 'RATED' und 'Like' abrufen. Es kann auch mehrere Vorkommen dieser Teilsequenzen geben.

['RATED',
 '  Awesome food at a good price .',
 'Delivery was very quick even on New Year\xe2\x80\x99s Eve .',
 'Please try crispy corn and veg noodles From this place .',
 'Taste maintained .',
 'Like',
 '1',
 'Comment',
 '0',
 'Share',
 'Divyansh Agarwal',
 '1 Review',
 'Follow',
 '3 days ago',
 'RATED',
 '  I have tried schezwan noodles and the momos with kitkat shake',
 "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone",
 'Like']

Ich habe verschiedene Methoden wie Regex ausprobiert, aber keine hat das Problem gelöst.

2
Amresh Giri 18 Jän. 2019 im 10:52

7 Antworten

Beste Antwort

Mit Regex können Sie dies auf diese Weise tun.

a= ['RATED','  Awesome food at a good price .', 
 'Delivery was very quick even on New Year’s Eve .', 
 'Please try crispy corn and veg noodles From this place .', 
 'Taste maintained .', 'Like', '1', 'Comment', '0', 
 'Share', 'Divyansh Agarwal', '1 Review', 'Follow', 
 '3 days ago', 'RATED', 
 '  I have tried schezwan noodles and the momos with kitkat shake', "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone", 
 'Like']


import re
string = ' '.join(a)
b = re.compile(r'(?<=RATED).*?(?=Like)').findall(string)
print(b)

Ausgabe

['   Awesome food at a good price . Delivery was very quick even on New Year’s Eve . Please try crispy corn and veg noodles From this place . Taste maintained . ',
 "   I have tried schezwan noodles and the momos with kitkat shake And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone "]
1
sahasrara62 18 Jän. 2019 im 08:43

Ich würde vorschlagen, dass Sie mehr über die Indexsuche und das Aufteilen von Sequenztypen erfahren:

Beispiel:

def group_between(lst, start_token, end_token):
    while lst:
        try:
            # find opening token
            start_idx = lst.index(start_token) + 1
            # find closing token
            end_idx = lst.index(end_token, start_idx)
            # output sublist
            yield lst[start_idx:end_idx]
            # continue with the remaining items
            lst = lst[end_idx+1:]
        except ValueError:
            # begin or end not found, just skip the rest
            break

l = ['RATED','  Awesome food at a good price .', 'Delivery was very quick even on New Year’s Eve .', 'Please try crispy corn and veg noodles From this place .', 'Taste maintained .', 'Like', 
     '1', 'Comment', '0', 'Share', 'Divyansh Agarwal', '1 Review', 'Follow', '3 days ago',
     'RATED', '  I have tried schezwan noodles and the momos with kitkat shake', "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone", 'Like'
]

for i in group_between(l, 'RATED', 'Like'):
    print(i)

Die Ausgabe ist:

['  Awesome food at a good price .', 'Delivery was very quick even on New Year’s Eve .', 'Please try crispy corn and veg noodles From this place .', 'Taste maintained .']
['  I have tried schezwan noodles and the momos with kitkat shake', "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone"]
2
moooeeeep 22 Jän. 2019 im 19:19

Eine Option ohne Flags:

new_list = []
group = [] # don’t need if the list starts with 'RATED'

for i in your_list:
    if i == 'RATED':
        group = []
    elif i == 'Like':
        new_list.append(group[:])
    else:
        group.append(i)
1
Mykola Zotko 18 Jän. 2019 im 10:20
def find_between(old_list, first_word, last_word):
    new_list = []
    flag = False
    for i in old_list:
        if i is last_word:
            break
        if i is first_word:
            flag = True
            continue
        if flag:
            new_list.append(i)
    return new_list
1
Kranthi Kiran 18 Jän. 2019 im 08:15

Sie könnten z.

rec = False
result = []
for s in lst:
    if s == 'Like':
        rec = False
    if rec:
        result.append(s)
    if s == 'RATED':
        rec = True

Ergebnis

#[' Awesome food at a good price .',
# 'Delivery was very quick even on New Year’s Eve .',
# 'Please try crispy corn and veg noodles From this place .',
# 'Taste maintained .',
# ' I have tried schezwan noodles and the momos with kitkat shake',
# "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone"]
1
SpghttCd 18 Jän. 2019 im 08:00

Sie können den folgenden Code verwenden, der eine einfache for Schleife verwendet:

l = ['RATED','  Awesome food at a good price .', 'Delivery was very quick even on New Year’s Eve .', 'Please try crispy corn and veg noodles From this place .', 'Taste maintained .', 'Like', 
     '1', 'Comment', '0', 'Share', 'Divyansh Agarwal', '1 Review', 'Follow', '3 days ago',
     'RATED', '  I have tried schezwan noodles and the momos with kitkat shake', "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone", 'Like'
]

st, ed, aa = None, None, []
for k, v in enumerate(l):
    if v == "RATED":
        st = k
    if v == "Like":
        ed = k
    if st != None and ed!= None:
        aa.extend(l[st+1: ed])
        st = None
        ed = None

print (aa)

# ['  Awesome food at a good price .', 'Delivery was very quick even on New Year’s Eve .', 'Please try crispy corn and veg noodles From this place .', 'Taste maintained .', '  I have tried schezwan noodles and the momos with kitkat shake', "And I would say just one word it's best for the best reasonable rates.... Gotta recommend it to everyone"]
1
Akash Swain 18 Jän. 2019 im 12:36

Sie können reguläre Ausdrücke verwenden. Zuerst müssen Sie Ihre Liste mit einem Trennzeichen verbinden, das im Text nicht vorkommt

delimiter = "#$#"
bigString = delimiter + delimiter.join(yourList) + delimiter

Danach können Sie reguläre Ausdrücke verwenden

results = re.findall(r'#\$#RATED#\$#(.*?)#\$#Like#\$#', bigString)

Jetzt müssen Sie nur noch alle Ergebnisse iterieren und die Zeichenfolge mit einem Trennzeichen teilen

for s in results:
    print(s.split(delimiter))
2
Luka Dumančić 18 Jän. 2019 im 09:40