Category Archives: Python

Automated Renaming and Filing of Photos and Screenshots on the Mac

The problem

Somewhere in 2016, maybe a bit later, but coinciding with a new Mac, I discovered that Apple had decided it would change the way it named screenshots. Specifically, the format switched from:

Screenshot YYYY-MM-DD HH.mm.ss.png

to

Screen Shot YYYY-MM-DD at hh.mm.ss AM

or

Screen Shot YYYY-MM-DD at hh.mm.ss PM

Why does this matter? Well, it means the ordering of screenshots is no longer in chronological order, and when you have a lot, this really is a nuisance. I am not completely clear if this is a conscious choice by Apple, but you can make your Mac do the same thing by opening up the Screenshot application (⌘-shift-5) and changing the mode from ‘Screenshots’ to ‘Clipboard’ and back.

The Solution

After many years of searching old internet posts for a solution to the problem I decided to change tack. There are some internal settings that you can change at the command line, but none would let me switch the file name back the format I preferred. At the same time I decided to automatically file the snapshots in folders organised by year, just to make finding my newest screenshots easier. You might ask why I keep them. Good question. I do not know, and I may very well delete them all at some point.

The solution is two-fold. I wrote a Python script to automate the changing of filenames, creation of folders, and moving of files, and used cron to automatically run the script once a minute, which is usually enough time for the file to be renamed and moved before I get around to including it in the email, or presentation, or document that I am working on. Why Python and not R—? After all I am more proficient in the latter than the former. I think Python’s handling of regular expressions is nicer and easier to work with. I also find the way it handles files, paths, and dealing with the operating system a little more logical and robust. Shout about this if you want—I promise I will not be listening.

The Python

I have written two scripts—one of which deals with the screenshots, and one of which effectively does the same thing for camera uploads to Dropbox. The second script is particular to the way Dropbox imports images, but it is not very complex and so could be adapted to another system. I also toyed with extracting information from the EXIF tags rather than manipulating file names, however this seems a bit clunky in Python (the date information appears to only be accessible by a numeric, but not named, tag). The scripts are given below, but are also hosted on github as gists here: jmcurran/tidyScreenshots.py and here: jmcurran/cleanCamUploads.py.


cron

cron is a software utility that is a time based job scheduler for Unix-like operating systems. Users can use cron to schedule commands or scripts to run at fixed times, dates, or intervals. My solution depends on cron being available, and hence it is useable on both a Mac and on Linux systems. It is unclear to me, although I will investigate, as to whether one can use the Linux Subsystem for Windows (WSL) to use cron. I have two entries in the crontab file, one for each of my scripts. Each entry says that the scripts should be executed every minute of every day. Although this sounds rather intense, it has negligible impact on the performance of my system. The crontab file can be edited by opening up a terminal window (open Spotlight and type terminal, and hit enter) and typing crontab -e at the command line. This will bring up a text file which can be edited using the vi editor. If you are not familiar with vi, this may be somewhat painful. Some useful commands:

Keystroke    Action
---------    ------
x            delete character
dd           delete line
i            enter insert mode
ESC          exit insert mode
shift-zz     save file and exit

As noted I set my scripts to run every minute. My crontab file looks like this:

0-59 * * * * /Users/jcur002/opt/anaconda3/bin/python3 $HOME/Dropbox/Code/Python/tidyScreenshots/cleanScreenShots.py >> ~/cron.log 2>&1
0-59 * * * * /Users/jcur002/opt/anaconda3/bin/python3 $HOME/Dropbox/Code/Python/tidyCamUploads/tidyCamUploads.py >> ~/cron.log 2>&1

NB: You can scroll the text in the code box above.

The key elements here are:

  1. 0-59 means run every minute.
  2. The four asterisks are used to specify hours (0-23), day of the month (0-31), month (1-12), and day of the week (0-6). As they are unspecified, this means my job is run every minute of every day.
  3. The path to the location of the python binary must be fully specified.
  4. The location of each script must also be unambigously specified, but can rely on shell variables like $HOME.

I should note that there is a strong element of “monkey see, monkey do” here. I am not completely au fait with cron but it appears to do what I want it to do. I hope you find this useful.

Share Button

Python and statistics – is there any point?

This semester I gave my graduate student class a project. The brief was relatively simple: implement the iteratively reweighted least squares (IRLS) algorithm to perform a simple (single covariate) logistic regression in Python. Their programmes were supposed to be able to read data in from a text file, perform the simple matrix algebra and math needed to carry out the IRLS computation and return some formatted output – similar to that you would get from R’s summary.glm function. Of course, you do not need matrix algebra to do this, but the idea was for the students to learn a bit of mathematical statistics that they had not seen before.  On the IRLS front, they were allowed to use a simple least squares routine like numpy’s linalg.lstsq and some of numpy’s simple matrix operators, but expressly forbidden from simply loading pandas or statsmodels and using the generalized linear models functions contained therein.

I thought this sounded like a straightforward enough task. The students divided themselves into pairs to work on it, and they had 13 weeks to complete the task.

The kicker was that I did not provide any instruction, either in Python or in the IRLS algorithm. An aim of the project was to simulate the situation where someone asks you to solve a problem, and you have to go and do some research to do it. Their first task was to complete 100 exercises on codeacademy.com as a reasonable introduction to a language none of them had seen before.

Problems – versions

There are two major versions of Python in the wild, 2.7 and 3.4. Codeacademy teaches using version 2.7. One fundamental difference between 2.7 and 3.4 is the syntax of the print function. All of my students are users of R, to varying levels of skill. When they go to install R at home, they know to go to the CRAN website, or a mirror, and download the current, stable release of R. If they followed this policy, as I did myself, then they would have installed Python 3.4 and found that the way they were taught to use print by Codeacademy does not work, without any sort of helpful “That syntax has been depricated. Python 3 onwards uses the syntax…” This is not the only issue, with the way Python 3.4 handles execution of loops over numbered ranges being another example of a fundamental difference.

Problems – platform issues

Most students at my institution use Windows, especially at home. There is some Mac penetration, and Linux is virtually non-existent (these are statistics students, not computer science remember). The official Python installers work perfectly well under Windows in my experience. However, then we come to the issue of installing numpy. The official advice from the numpy website seems to be “download a third party version of Python which already has it.” For students who come from a world where a package can be installed by going to a menu, this is less than useful. The common advice from the web is that “there is no official release of numpy 1.8.1 for Python 2.7 or higher for Windows” but that you can download it and install it from a the builds very thoughtfully provided by Christoph Gohlke at UC Irvine here. Christoph’s builds work fine, but again, for something that seems, at least from the outside, very mainstream in the Python community should the user have to go to this level of effort?

Problems – local installations

Like any instructor, I face the issue that a number of my students have no option but to use the computer laboratories provided for them by the university. This means that we encounter the issue of local installation of libraries for users. Most, if not all, R packages from CRAN can be installed in a local library. As far as I can tell, this is not true for a Windows installation of Python. I am happy to be corrected on this point. The aforementioned Python binaries come with proper Windows installers, which want to install into the Python root directory, something students do not have permission to do. If I had realized this problem in December of last year, I could have asked the admins to pre-install it for all users, however, given I only formulated the problem in February, it was just a tad too late.

Would I do it again?

I might, but there would have to be serious efforts to resolve the problems listed above on my part. It also would not solve problems of students trying to set up Python at home, and I do not feel like hand-holding people through an installation process. My initial plan had been to try Javascript. I may return to this idea.

I would be the first to admit that I am not a Python user, but I am an experienced programmer with over thirty years of experience in at least a dozen different languages, and on multiple platforms. I know many people find Python a very useful language for their scientific computing, and I am not attempting to bad mouth the language – it seems a decent enough language with the constructs and functionality that I would expect to find in any modern language – but I do not think there is much incentive for a statistician to move away from R, or an R/C++ combination when raw compute power is required.

I am glad that my students experienced programming in a non-vectorized language. R does give a distorted perspective on programming with regards to its handling of vectors, and I think it is beneficial for students to learn about flow structures for element-wise computation.

Update

Nat Dudley has made the suggestion I used on online IDE like nitrous.io.

Second updates

Despite the difficulties, nearly all of my students have managed to complete the task, and some have done an exceptionally good job, even adding in the ability to parse R-like formulae.



Share Button