Acropolis, a cluster for the Social Sciences Division

shfe_incoming_0_0.png

In recent years a student contacted SSCS with a vexing issue related to her thesis and graduation:  her work depended on single-threaded open source code which combed through a large data set for particular data points, and then passed those points into a algorithm for final analysis.   She needed to run this analysis over a large number of large data sets, and she needed to tweak her process as her research uncovered new answers.  By her calculations, she would need years to complete this analysis.

  Because of the nature of her code, she could not make use of the larger memory or multiple cores available on the servers of the day.  Running the program tied up her laptop for days.  Re-writing the code to take advantage of multi-core systems was risky: weeks of development and debugging work were not guaranteed to speed up the problem enough to pay off in terms of time or effort.

The user met with the SSCS Server Group, and together they constructed a framework for her research in which she could effectively run dozens of analyses simultaneously.  Within a few hours, she was able to submit batches of tasks to the cluster, and move on with the rest of her workload unimpeded, and she completed this pesky phase of her thesis work in a matter of weeks rather than years.

This sort of effective solution is precisely why the Acropolis cluster has grown from an experimental system into a workhorse for Social Sciences Division researchers.

Unlike other clusters on campus, Acropolis and its administrators are singularly focused on supplying resources for the computational tasks of the Social Sciences Division.  SSCS administrators work to ensure SSD researchers avoid pitfalls when scaling their work into a distributed environment.

A common problem resembles that of the researcher in the first part of this article, who found an effective open-source program that performed the task she needed, but which was written out of scale to the work she had in mind.  The programmer wrote the software to analyze small datasets, not the gargantuan files from which she wanted answers.   While she was not a computer scientist, she could have devoted time to rewrite the software.  Rather than rewriting slow, but sufficient code, the effective solution was to harness the nodes of the cluster to churn through many datasets at the same time.

Other research tasks might require more memory than is available on any single node, in which case the cluster can be used to span a large problem over multiple nodes in order to complete a task infeasible otherwise.

The Acropolis cluster supplies access to a number of programming languages ranging from Julia, R, python, and Fortran to Mathematica and Matlab; a variety of database infrastructures including and MySQL, PostgreSQL and Mongo databases.     The breadth of statistical systems supported requires administrative attention and tuning.

Learn more about Acropolis, and how eligible SSD researchers can obtain an account at https://sscs.uchicago.edu/page/server-support-faqs