diff and regular expressions

Diff can be used to compare two files, but its capabilities go far beyond this. It can also be used to compare whole directory trees with the -r option, and to ignore changes that fit a certain regular expression (such as comment lines that always start with the same character). Why would one want to do this? Say, when they need to compare two versions of a program to see if any bugs could have been introduced in the newer version.

For example, I was looking to see if there were any obvious changes in qtopia-core-opensource-4.3.1 that might cause it to fail build on an x86_64 system. With the assumption that an earlier version (4.3.0) did actually work, it is a short order to diff the two directories:

diff -r qtopia-core-opensource-src-4.3.0 qtopia-core-opensource-src-4.3.1 > changes.txt

However, the package maintainers changed the license text in each file, obscuring the important (to me) changes in thousands of lines of comment change. A good solution is to add a regular expression to the diff expression so that it ignores the comments (which luckily all start with ‘**’:

diff -r --ignore-matching-lines="\*\*" qtopia-core-opensource-src-4.3.0 qtopia-core-opensource-src-4.3.1 > changes.txt

Finally, there are many pesky .png files that have changed, so it is useful to wipe them out as well by using the –exclude option:

diff -r --ignore-matching-lines="\*\*" --exclude="*.png" qtopia-core-opensource-src-4.3.0 qtopia-core-opensource-src-4.3.1 > changes.txt

Both of these options help to cut down on additions, but unfortunately, the file size is still to big to be very useful. For comparison, the original diff was 192,286 lines long, and the final diff was 84,126 lines long.

This entry was posted in tech. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>