0

I have this CSV where the column "Location" is a string that usually starts and ends with "and sometimes has a comma in the text string. The problem is that some cases do not have double quotes.

First rows of the CSV

Date,Time,Location,Operator,Flight #,Route,Type,Registration,cn/In,Aboard,Fatalities,Ground,Summary
09/17/1908,17:18,"Fort Myer, Virginia",Military - U.S. Army,,Demonstration,Wright Flyer III,,1,2,1,0,"During a demonstration flight, a U.S. Army flyer flown by Orville Wright nose-dived into the ground from a height of approximately 75 feet, killing Lt. Thomas E. Selfridge who was a passenger. This was the first recorded airplane fatality in history.  One of two propellers separated in flight, tearing loose the wires bracing the rudder and causing the loss of control of the aircraft.  Orville Wright suffered broken ribs, pelvis and a leg.  Selfridge suffered a crushed skull and died a short time later."
07/12/1912,06:30,"AtlantiCity, New Jersey",Military - U.S. Navy,,Test flight,Dirigible,,,5,5,0,"First U.S. dirigible Akron exploded just offshore at an altitude of 1,000 ft. during a test flight."
08/06/1913,,"Victoria, British Columbia, Canada",Private,-,,Curtiss seaplane,,,1,1,0,"The first fatal airplane accident in Canada occurred when American barnstormer, John M. Bryant, California aviator was killed."
09/09/1913,18:30,Over the North Sea,Military - German Navy,,,Zeppelin L-1 (airship),,,20,14,0,The airship flew into a thunderstorm and encountered a severe downdraft crashing 20 miles north of Helgoland Island into the sea. The ship broke in two and the control car immediately sank drowning its occupants.

I manage to deal with cases with double quotes and a comma in the middle, but I don't know how to handle the cases with quotes.

Input that I have

awk 'BEGIN{FPAT = "([^,]+)|(\"[^\"]+\")"}{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13}' Airplane_Crashes_and_Fatalities_Since_1908.csv | head -n 5

Output that I have

Date Time Location Operator Flight # Route Type Registration cn/In Aboard Fatalities Ground Summary
09/17/1908 17:18 "Fort Myer, Virginia" Military - U.S. Army Demonstration Wright Flyer III 1 2 1 0 "During a demonstration flight, a U.S. Army flyer flown by Orville Wright nose-dived into the ground from a height of approximately 75 feet, killing Lt. Thomas E. Selfridge who was a passenger. This was the first recorded airplane fatality in history.  One of two propellers separated in flight, tearing loose the wires bracing the rudder and causing the loss of control of the aircraft.  Orville Wright suffered broken ribs, pelvis and a leg.  Selfridge suffered a crushed skull and died a short time later."  
07/12/1912 06:30 "AtlantiCity, New Jersey" Military - U.S. Navy Test flight Dirigible 5 5 0 "First U.S. dirigible Akron exploded just offshore at an altitude of 1,000 ft. during a test flight."   
08/06/1913 "Victoria, British Columbia, Canada" Private - Curtiss seaplane 1 1 0 "The first fatal airplane accident in Canada occurred when American barnstormer, John M. Bryant, California aviator was killed."    
09/09/1913 18:30 Over the North Sea Military - German Navy Zeppelin L-1 (airship) 20 14 0 The airship flew into a thunderstorm and encountered a severe downdraft crashing 20 miles north of Helgoland Island into the sea. The ship broke in two and the control car immediately sank drowning its occupants. 

How to select only the rows that in the column "Location" begins and ends with ".

Output that I want

Date,Time,Location,Operator,Flight #,Route,Type,Registration,cn/In,Aboard,Fatalities,Ground,Summary
09/17/1908,17:18,"Fort Myer, Virginia",Military - U.S. Army,,Demonstration,Wright Flyer III,,1,2,1,0,"During a demonstration flight, a U.S. Army flyer flown by Orville Wright nose-dived into the ground from a height of approximately 75 feet, killing Lt. Thomas E. Selfridge who was a passenger. This was the first recorded airplane fatality in history.  One of two propellers separated in flight, tearing loose the wires bracing the rudder and causing the loss of control of the aircraft.  Orville Wright suffered broken ribs, pelvis and a leg.  Selfridge suffered a crushed skull and died a short time later."
07/12/1912,06:30,"AtlantiCity, New Jersey",Military - U.S. Navy,,Test flight,Dirigible,,,5,5,0,"First U.S. dirigible Akron exploded just offshore at an altitude of 1,000 ft. during a test flight."
08/06/1913,,"Victoria, British Columbia, Canada",Private,-,,Curtiss seaplane,,,1,1,0,"The first fatal airplane accident in Canada occurred when American barnstormer, John M. Bryant, California aviator was killed."

Merinoide
  • 67
  • 4
  • Please add sample input (no descriptions, no images, no links) and your desired output for that sample input to your question (no comment). – Cyrus Dec 24 '21 at 10:10
  • There are several existing questions about dealing with quoting in CSV files, but this is necessarily unwineldy in Awk. Can you switch to a dedicated tool, or perhaps a language with proper CSV support? Python has a versatile and well-documented `csv` module in its standard library. – tripleee Dec 24 '21 at 10:36
  • 1
    If the first two fields are always unquoted, simply `grep -E '^[^,]+,[^,]+,"[^"]+",' file.csv` (or obviously the same regex in Awk if you really insist). – tripleee Dec 24 '21 at 10:38
  • 1
    This Ed Morton's solution https://stackoverflow.com/a/29650812/14259465 can help you. – Carlos Pascual Dec 24 '21 at 12:10

0 Answers0