Checking out a single branch from a GitHub repo
If we need only a single code branch, e.g., "experimental
", use:
git clone --single-branch --branch experimental https://github.com/[name]/[repo].git localrepo
Searching for non-ASCII characters in a file
Most online solutions involve grep -P -n "[\x80-\xFF]"
but these solutions do not work on Mac OS X or BSD variants of grep
. Instead, use Perl for a more portable solution:
perl -ne 'print if /[^[:ascii:]]/' filename.txt
Tags: software, textprocessing, Unix, MacOS
Sampling a large text file
Given a large enough text file, sampling solutions like the shuf
utility will run out of memory. For statistical sampling, use one line of awk:
awk 'BEGIN {srand()} !/^$/ { if (rand() <= 0.01) print $0}' input.txt > output.txt
This returns about 1% of the lines in the file. For an exact number of lines, use a higher sampling ratio and | head -n
.
Tags: software, textprocessing, Unix
Shrinking large PDF images
So you have some large (MB-sized) PDF images and you need to reduce them in size, maybe because arXiv requires images to be compressed. Starting with a file f1.pdf (1031756 bytes):
- Use ImageMagick to compress to JPG or PNG and then re-encode as PDF. The JPG compression is very efficient, but the PDF re-encoding is not.
convert f1.pdf -format JPG -quality 50 f1a.jpg → 78532 bytes (7.6%) convert f1.pdf -format JPG -quality 10 f1a.pdf → 758028 bytes (73%) convert f1.pdf -format JPG -quality 90 f1a.pdf → 758028 bytes (73%) convert f1.pdf -format PNG -quality 50 f1a.pdf → 758028 bytes (73%)
- ImageMagick output can be processed with jpeg2ps and then epstopdf for better results:
convert f1.pdf -format JPG -quality 50 f1a.jpg jpeg2ps f1a.jpg > f1a.eps epstopdf f1a.eps → 81228 bytes (8%)
- Use Ghostscript with the /screen or /ebook PDF output settings.
gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \ -dPDFSETTINGS=/screen -sOutputFile=f1b.pdf f1.pdf → 176120 bytes (17%)
The /ebook setting output was nearly the same size as the /screen output. - On Mac OS X (10.14.5 Mojave), exporting from Preview with the "Reduce File Size" Quartz filter gave excellent results (105368 bytes, 10%). This is harder to access from the command line, but the default filter is at:
/System/Library/Filters/Reduce File Size.qfilter
and the ColorSync utility can create modified versions of that filter in the ~/Library/Filters folder.
Preview's Save as JPG also gives very good results:Quality setting = 7 [1...9] → 166649 bytes (16%) Quality setting = 5 [1...9] → 87077 bytes (8.5%)
- Adobe Acrobat Pro has a PDF Optimizer, but does not give as compact results even with 72 DPI output and minimum JPG quality settings (412528 bytes, 40%). Photoshop and Illustrator can produce fairly compact JPGs that can be wrapped back into PDF files (as above), but Preview offers a simple and good enough solution.
Bottom line: Use Preview for conversions by hand, or use the ImageMagick convert utility for JPG output, then wrap it as PDF via jpeg2ps and epstopdf, if scripting is required.
Tags: software, MacOS, graphics, PDF
What's inside a mystery software package file?
A package (.pkg file in OS X) is an .xar archive containing a cpio.gz archive of installable files in "Payload", along with a "bill of materials", scripts, etc. To inspect the contents, unpack the .xar into a directory, and then open the Payload file:
mkdir scratch; cd scratch xar -xf ../mystery.pkg gunzip -dc Payload | cpio -i
The file hierarchy shows where the contents of the package would be distributed during installation.
How is this blog generated?
This web log is generated by a modified version of BashBlog, a simple Bash script blogging engine. My version, with customized CSS and global variables, is available here.