Wednesday, October 26, 2005
sed dos2unix {Scanned}
Sed dos2unix
[user@host] # sed s/.$//g unixfile > unixfile.sed
Unfortunately, this also removes characters that you may not want removed (e.g. the "T" in "CDT"):
Another option uses sed again, but strips the specific character instead of the last character on each line:
[user@host] # sed s/^M//g unixfile > unixfile.sed2
One very important item to understand about this command is that the "^M" (control character) is not generated by typing the "^" character, and then the "M" character from your keyboard. Instead, it is accomplished by typing Ctrl-V and then Ctrl-M (the Ctrl key and the V or M key are pressed simultaneously). Typing this sequence will produce the "^M" (control character), which allows sed to locate and process it as instructed.
The most desirable is running the dos2unix utility against the file:
[user@host] # dos2unix unixfile unixfile.dos2unix
####################################################################################################
Convert dos text files to unix, and vice versa:
dos2unix file.txt unix2dos file.txt tr -d '\015' < win.txt > unix.txt # if you can't find dos2unix sed -e 's/$/\r/' < unix.txt > win.txt # if you can't find unix2dos |
####################################################################################################
With vi
Notice that some programs are not consistent in the way they insert the line breaks so you end up with some lines that have both a carrage return and a ^M and some lines that have a ^M and no carrage return (and so blend into one). There are two steps to clean this up.
1. replace all extraneous ^M:
:%s/^M$//g
BE SURE YOU MAKE the ^M USING "CTRL-V CTRL-M" NOT BY TYPING "CARROT M"! This expression will replace all the ^M's that have carriage returns after them with nothing. (The dollar ties the search to the end of a line)
2. replace all ^M's that need to have carriage returns:
:%s/^M/ /g
Once again: BE SURE YOU MAKE the ^M USING "CTRL-V CTRL-M" NOT BY TYPING "CARROT M"! This expression will replace all the ^M's that didn't have carriage returns after them with a carriage return.
It also works with
:%s/\r//g
I think using this command is easier.
:set ff=unix //to unix file
:set ff=dos //to windows file
Or
:set fileformat=dos
:set fileformat=unix
with:
:%s/^M/\r/g
works perfectly !!!
####################################################################################################
Quick Script
#!/bin/bash
# To replace dos linebreaks for Unix compatibility
echo "This script will replace the ^M line breaks from dos."
echo -n "Enter filename without extension: "
read file
echo -n "Enter extension: "
read ext
sed 's/\r//' $file.$ext > $file2.$ext
cp -f $file2.$ext $file.$ext
rm -f $file2.$ext
This script is the same as before, just minus one step.
#!/bin/bash
# To replace dos linebreaks for Unix compatibility
echo "This script will replace the ^M line breaks from dos."
echo -n "Enter filename: "
read file
sed 's/\r//' $file > 2$file
cp -f 2$file $file
rm -f 2$file
Heres another little script
#!/bin/sh
FILE="$1"
# Use sed with the -i command line for inline interpreting.
sed -i '' "s/\r//g" $FILE
####################################################################################################
Just trim
From the UNIX shell: tr -d "\015" < inputfile > outputfile
E.g.: tr -d "\015" < dosformatted.txt > unixformatted.txt
####################################################################################################
Lets replace it with a new line!
sed "s/^M/\n/g" replaces the ^M with a linux newline.
the ^M is written Ctrl-V Ctrl-M, not with the carrot char.
####################################################################################################
to address the problem of ^M ( <ctrl>M ) characters in multiple files
the following single line SHell command will be helpful
for name in `ls *.dat` ; do sed 's/^M//' $name > ${name/\.dat/N\.dat} ; mv ${name/\.dat/N\.dat} $name ; done
####################################################################################################
Now in C
/* A program to strip control characters */
#include <stdio.h>
FILE *in,*tmp;
int main(int argc, char *argv[])
{
int index,count;
unsigned char byte;
printf("Hello! Are you ready to get rid of those nasty crlf?\n");
if(argc<2) {
printf("You need to specify an input file\n");
return 1;
}
if((tmp = fopen("tmp.tmp","wb"))==0){
printf("We could not open the temportary file called tmp.tmp\n");
return 2;
}
if((in = fopen(argv[1],"rb"))==0){
printf("We could not open the input file called %s\n",argv[1]);
return 3;
}
do{
count = fread(&byte,1,1,in);
if(count == 1){
if(byte!=0x0d) fwrite(&byte,1,1,tmp);
}
}while(count==1);
fclose(tmp);
fclose(in);
rename("tmp.tmp",argv[1]);
return 0;
}
####################################################################################################
Sed Again and Again…
# Under UNIX: convert DOS newlines (CR/LF) to Unix format
bash$ sed 's/.$//' file # assumes that all lines end with CR/LF
bash$ sed 's/^M$// file # in bash/tcsh, press Ctrl-V then Ctrl-M
# Under DOS: convert Unix newlines (LF) to DOS format
C:\> sed 's/$//' file # method 1
C:\> sed -n p file # method 2
And trim one more time…
tr -d [^M] < inputfile > outputfile
####################################################################################################
Now in Perl
One Command Line
The simplest perl script is this one: perl -pi -e 's/\r\n/\n/;' *.java
This does the reverse: perl -pi -e 's/\n/\r\n/;' *.java
Two Lines
If you wish to be a little more complicated, you can do the same in two lines of perl. This enables you to simply name the file(s) you wish to convert on the command line. It would be used like so: dos2unix-2line *.java
Here is what dos2unix-2line it looks like:
#!/usr/bin/perl -pi
s/\r\n/\n/;
Here is what unix2dos-2line it looks like:
#!/usr/bin/perl -pi
s/\n/\r\n/;
More perl…
#!/bin/sh
perl -p -i -e 'BEGIN { print "Converting DOS to UNIX.\n" ; } END { print "Done.\n" ; } s/\r\n$/\n/' $*