How Did I Fix That?

A log of solutions to problems I've encountered. No warranties.

Searching for non-ASCII characters in a file

03 December 2019

Most online solutions involve grep -P -n "[\x80-\xFF]" but these solutions do not work on Mac OS X or BSD variants of grep. Instead, use Perl for a more portable solution:
perl -ne 'print if /[^[:ascii:]]/' filename.txt

Tags: software, textprocessing, Unix, MacOS

Sampling a large text file

14 June 2019

Given a large enough text file, sampling solutions like the shuf utility will run out of memory. For statistical sampling, use one line of awk:
awk 'BEGIN {srand()} !/^$/ { if (rand() <= 0.01) print $0}' input.txt > output.txt
This returns about 1% of the lines in the file. For an exact number of lines, use a higher sampling ratio and | head -n.

Tags: software, textprocessing, Unix