How Did I Fix That?

A log of solutions to problems I've encountered. No warranties.

Sampling a large text file

14 June 2019

Given a large enough text file, sampling solutions like the shuf utility will run out of memory. For statistical sampling, use one line of awk:
awk 'BEGIN {srand()} !/^$/ { if (rand() <= 0.01) print $0}' input.txt > output.txt
This returns about 1% of the lines in the file. For an exact number of lines, use a higher sampling ratio and | head -n.

Tags: software, textprocessing, Unix

Shrinking large PDF images

10 June 2019

So you have some large (MB-sized) PDF images and you need to reduce them in size, maybe because arXiv requires images to be compressed. Starting with a file f1.pdf (1031756 bytes):

  • Use ImageMagick to compress to JPG or PNG and then re-encode as PDF. The JPG compression is very efficient, but the PDF re-encoding is not.
    convert f1.pdf -format JPG -quality 50 f1a.jpg  → 78532 bytes (7.6%)
    convert f1.pdf -format JPG -quality 10 f1a.pdf	→ 758028 bytes (73%)
    convert f1.pdf -format JPG -quality 90 f1a.pdf	→ 758028 bytes (73%)
    convert f1.pdf -format PNG -quality 50 f1a.pdf	→ 758028 bytes (73%)
    
  • ImageMagick output can be processed with jpeg2ps and then epstopdf for better results:
    convert f1.pdf -format JPG -quality 50 f1a.jpg
    jpeg2ps f1a.jpg > f1a.eps
    epstopdf f1a.eps   → 81228 bytes (8%)
    
  • Use Ghostscript with the /screen or /ebook PDF output settings.
    gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
           -dPDFSETTINGS=/screen -sOutputFile=f1b.pdf f1.pdf  → 176120 bytes (17%)
    
    The /ebook setting output was nearly the same size as the /screen output.
  • On Mac OS X (10.14.5 Mojave), exporting from Preview with the "Reduce File Size" Quartz filter gave excellent results (105368 bytes, 10%). This is harder to access from the command line, but the default filter is at:
    /System/Library/Filters/Reduce File Size.qfilter
    
    and the ColorSync utility can create modified versions of that filter in the ~/Library/Filters folder.
    Preview's Save as JPG also gives very good results:
    Quality setting = 7 [1...9]  → 166649 bytes (16%)
    Quality setting = 5 [1...9]  → 87077 bytes (8.5%)
    
  • Adobe Acrobat Pro has a PDF Optimizer, but does not give as compact results even with 72 DPI output and minimum JPG quality settings (412528 bytes, 40%). Photoshop and Illustrator can produce fairly compact JPGs that can be wrapped back into PDF files (as above), but Preview offers a simple and good enough solution.

Bottom line: Use Preview for conversions by hand, or use the ImageMagick convert utility for JPG output, then wrap it as PDF via jpeg2ps and epstopdf, if scripting is required.

Tags: software, MacOS, graphics, PDF

What's inside a mystery software package file?

27 March 2019

A package (.pkg file in OS X) is an .xar archive containing a cpio.gz archive of installable files in "Payload", along with a "bill of materials", scripts, etc. To inspect the contents, unpack the .xar into a directory, and then open the Payload file:

mkdir scratch; cd scratch
xar -xf ../mystery.pkg
gunzip -dc Payload | cpio -i

The file hierarchy shows where the contents of the package would be distributed during installation.

Tags: MacOS, software

Upgarding Python packages with Anaconda

12 March 2019

To update all installed packages:

conda update conda
conda update --all

To upgrade to a new distribution:

conda update conda
conda update anaconda

This is a stable release, but usually not what we're looking for.

Tags: Python, Anaconda

No sound on my Mac

05 October 2018

Switching the output sound device repeatedly (e.g., going back and forth between external speakers and headphones several times) sometimes kills sound output on my laptop (currently on OS X 10.13 High Sierra). To fix, restart the Core Audio services:

sudo killall coreaudiod
Annoying, but workable.

Tags: MacOS, bug-workaround

Why won't my Mac go to sleep?

29 August 2018

On Mac OS X 10.13 (High Sierra), power management status is reported by:

pmset -g
pmset -g assertions
Items with non-zero assertions (like "UserIsActive") are preventing sleep.

Tags: MacOS

How is this blog generated?

29 August 2018

This web log is generated by a modified version of BashBlog, a simple Bash script blogging engine. My version, with customized CSS and global variables, is available here.

Tags: HTML, software