-1

I have a text like this:

...
Sentence one. hsjdhsd jghdsjghjdskhgjksdh kjghdsjkg

sdgsdg
dgds
hfdhdf
h
fdh
dfh Sentence two. gdjshagjhsdga sdgjhsdkjgh adskjghdsa
gs a
gfdgfdhfdhh
...

And I would need to pull from this paragraph the text that is between strings (actually it's a sentence) Sentence one. and Sentence two..

Could you help me guys, please, how to pull it?

Thanks

user984621
  • 43,874
  • 71
  • 210
  • 384
  • 3
    You didn't include what you have so far? – Jerry May 21 '13 at 16:27
  • I doubt you'll be able to differentiate an arbitrary real sentence verses gibberish with a reasonable regular expression. Some kind of simple parser is probably going to be your best bet. – AndyPerfect May 21 '13 at 16:31
  • 1
    `/Sentence one(.*?)Sentence two/m` will work, but only if `Sentence one` and `Sentence two` are exact and not nested. – Explosion Pills May 21 '13 at 16:34

3 Answers3

1

Looking at what you have, the start and end of your sentence are a capital letter and a period, respectively. You can construct a regular expression that pulls out the text between a capital letter and the first period that comes after.

But this may be a contrived example; it looks like you may have types random keys in the middle of the keyboard, so this may not be the characteristics of your actual gibberish.

John
  • 15,934
  • 9
  • 67
  • 109
1

Try something like this([A-Z]{1}.*\.)?

Lifeweaver
  • 944
  • 8
  • 27
0

Use a Conditional Flip-Flop Expression

Given your corpus as defined above:

ruby -ne 'puts $_ if /Sentence/ ... /Sentence/' /tmp/corpus

will output:

Sentence one. hsjdhsd jghdsjghjdskhgjksdh kjghdsjkg

sdgsdg
dgds
hfdhdf
h
fdh
dfh Sentence two. gdjshagjhsdga sdgjhsdkjgh adskjghdsa
Todd A. Jacobs
  • 76,463
  • 14
  • 137
  • 188