python - Cleaning files with multiple duplicates -
i had copy in emergency database on website.
i scraped using functions made in python using admin codes. database formated :
name: phone number: has played game:
everything copied in .txt
file sometimes, find errors in file like:
name: name: name: bob
how can clean mess using shell command or python keeping same order (i want still name, phone number, etc) ?
say have in db.txt
phone number: phone number: phone number: phone number: 0118521358 name: name: name: name: bob has played game: name: name: name: name: bob
you can try little script this
import re #create new file called new_file new_file=open("new_file",'w') #open database file discrepancies file_with_error=open('db.txt','r') #make list of columns in db db_header=['name:','phone number:'] #iterate through each line in database file , find matches replace line in file_with_error: col_name in db_header: line=re.sub("(%s[ ]*)+" %(col_name,),col_name,line) new_file.write(line) #write new line file new_file.close() exit(0)
Comments
Post a Comment