How to retain delimiter within list item python

Question

I'm writing a program which jumbles clauses within a text using punctuation marks as delimiters for when to split the text.

At the moment my code has a large list where each item is a group of clauses.

import re
from random import shuffle
clause_split_content = []

text = ["this, is. a test?", "this: is; also. a test!"]

for i in text:
        clause_split = re.split('[,;:".?!]', i)
        clause_split.remove(clause_split[len(clause_split)-1])
        for x in range(0, len(clause_split)):
                clause_split_content.append(clause_split[x])
shuffle(clause_split_content)
print(*content, sep='')

at the moment the result jumbles the text without retaining the punctuation which is used as the delimiter to split it. The output would be something like this:

a test this also this is a test is

I want to retain the punctuation within the final output so it would look something like this:

a test! this, also. this: is. a test? is;

Why split it on the punctuation? Can't you just take each index in the list and append it as a single string? — Captain Caveman, May 27 '22 at 17:06
In my program each item in the list is a line of text within a larger text. However, within each line there is punctuation which I need to be able split further. — user7266757, May 27 '22 at 17:14
I'm not certain I understand your question. Is the answer below close? — Captain Caveman, May 27 '22 at 17:18
Does this answer your question? [In Python, how do I split a string and keep the separators?](https://stackoverflow.com/questions/2136556/in-python-how-do-i-split-a-string-and-keep-the-separators) — G. Anderson, May 27 '22 at 17:20

Captain Caveman · Answer 1 · 2022-05-27T19:09:03.610

0

Option 1: Shuffle words in each index and combine into sentence.

from random import shuffle

count = 0
sentence = ''
new_text = []
text = ["this, is. a test?", "this: is; also. a test!"]

while count < len(text):
    new_text.append(text[count].split())
    shuffle(new_text[count])
    count += 1

for i in new_text:
    for j in i:
        sentence += j + ' '

print(sentence)

Sample shuffled output:

test? this, a is. is; test! this: a also. 
test? a is. this, is; test! a this: also. 
is. test? a this, test! a this: also. is;

Option 2: Combine all elements in list into single element, then shuffle words and combine into a sentence.

import random
from random import shuffle

count = 0
sentence = ''
new_text = []
text_combined = []
text = ["this, is. a test?", "this: is; also. a test!"]

while count < len(text):
    new_text.append(text[count].split())
    count += 1

for i in new_text:
    for j in i:
        text_combined.append(j)

shuffled_list = random.sample(text_combined, len(text_combined))        

for i in shuffled_list:
    sentence += i + ' '
     
print(sentence)

Sample Ouput:

this, is; also. a this: is. a test? test! 
test! is. this: test? a this, a also. is; 
is. a a is; also. test! test? this, this:

edited May 27 '22 at 19:09

answered May 27 '22 at 17:18

Captain Caveman

803
1
6
21

Wow thank you! This works perfectly and is far simpler than how I was trying to do it. – user7266757 May 27 '22 at 17:38
One thing though, in the context of my program sometimes the text could be thousands of lines long, would it be more efficient in that case to use a regex method? – user7266757 May 27 '22 at 17:42
1

@user7266757 So, you only wanted to shuffle the whole texts and not the words within the texts? This solution has only two possible outcomes: `this: is; also. a test! this, is. a test? ` and `this, is. a test? this: is; also. a test!`. – JANO May 27 '22 at 17:43
I don't think this does what the question asks. It is shuffling the full strings in `text` (there are only two in the example, so it either leaves them as is or swaps them) then simply creating a sentence by appending the full original strings with an intervening space. It has no awareness of the clauses described in the question. – constantstranger May 27 '22 at 17:44
yeah looking at it further, this method doesn't properly shuffle the text how I want. – user7266757 May 27 '22 at 17:48
You are correct, my solution only shuffles the list indexes, not the strings within each list index. Op mentioned they had a 'large list' and the example was a subset. I must have misunderstand the question. – Captain Caveman May 27 '22 at 17:49
Ok, updated answer. I believe this is what you need. – Captain Caveman May 27 '22 at 18:18
@Captain Caveman Your shuffle only rearranges the clauses within a given string in `text`. To do what OP shows in the question, you would probably need to get all the clauses from all the strings in `text` into a single list, which you would then shuffle. – constantstranger May 27 '22 at 18:25
Added a second solution that combines all elements in 'text' list into a single element, then shuffles the words and combines them into a single sentence. – Captain Caveman May 27 '22 at 19:09

JANO · Answer 2 · 2022-05-27T17:41:31.330

I think you are simply using the wrong function of re for your purpose. split() excludes your separator, but you can use another function e.g. findall() to manually select all words you want. For example with the following code I can create your desired output:

import re
from random import shuffle

clause_split_content = []

text = ["this, is. a test?", "this: is; also. a test!"]

for i in text:
    words_with_seperator = re.findall(r'([^,;:".?!]*[,;:".?!])\s?', i)
    clause_split_content.extend(words_with_seperator)
    
shuffle(clause_split_content)
print(*clause_split_content, sep=' ')

Output:

this, this: is. also. a test! a test? is;

The pattern ([^,;:".?!]*[,;:".?!])\s? simply takes all characters that are not a separator until a separator is seen. These characters are all in the matching group, which creates your result. The \s? is only to get rid of the space characters in between the words.

constantstranger · Answer 3 · 2022-05-27T17:47:21.807

Here's a way to do what you've asked:

import re
from random import shuffle
text = ["this, is. a test?", "this: is; also. a test!"]
content = [y for x in text for y in re.findall(r'([^,;:".?!]*[,;:".?!])', x)]
shuffle(content)
print(*content, sep=' ')

Output:

 is;  is.  also.  a test? this,  a test! this:

Explanation:

the regex pattern r'([^,;:".?!]*[,;:".?!])' matches 0 or more non-separator characters followed by a separator character, and findall() returns a list of all such non-overlapping matches
the list comprehension iterates over the input strings in list text and has an inner loop that iterates over the findall results for each input string, so that we create a single list of every matched pattern within every string.
shuffle and print are as in your original code.

@user7266757 Do you need more help with this question? – constantstranger May 28 '22 at 17:36 — constantstranger, May 28 '22 at 17:36

How to retain delimiter within list item python

3 Answers3