python - Cleaning files with multiple duplicates -


i had copy in emergency database on website.

i scraped using functions made in python using admin codes. database formated :

name:  phone number:  has played game:  

everything copied in .txt file sometimes, find errors in file like:

name: name: name: bob 

how can clean mess using shell command or python keeping same order (i want still name, phone number, etc) ?

say have in db.txt

phone number:  phone number: phone number: phone number: 0118521358 name: name: name: name: bob has played game: name: name: name: name: bob 

you can try little script this

import re #create new file called new_file new_file=open("new_file",'w') #open database file discrepancies file_with_error=open('db.txt','r') #make list of columns in db db_header=['name:','phone number:'] #iterate through each line in database file ,  find matches replace line in file_with_error:     col_name in db_header:         line=re.sub("(%s[ ]*)+" %(col_name,),col_name,line)     new_file.write(line) #write new line file new_file.close() exit(0) 

Comments

Popular posts from this blog

wordpress - (T_ENDFOREACH) php error -

Export Excel workseet into txt file using vba - (text and numbers with formulas) -

Using django-mptt to get only the categories that have items -