Remember the days when at the end of a postdoctoral stint, you could copy all of your files – documents, spreadsheets, presentations, data images and email – onto a couple of 5¼ inch floppy diskettes and sashay out of the lab, secure in the knowledge that you’d backed up everything you needed? (As your age decreases from mine, substitute for 5¼ floppies in the following order: 3½ inch floppies, ZIP disks, JAZZ disks, optical mini-disks, CDs, DVDs and 2 GB hard-drives. Please chime in if I’ve forgotten any of the formats – the mid-Nineties are a bit of a blur.)

Quite a handful When the data start to wear you down.
When I was preparing for my sabbatical in Germany, I did a rough calculation to work out how much hard-drive space I’d need to bring my data back home with me. It went something like this:
(384 spots/chip) x (8 chips/library) x (40 time points) x (2 fluorescent channels/spot) x (3 Mb/image) x (3 replicates) = 2,211,840 Mb or approximately 2.2 terabytes
Given the assumption that in addition to three perfect replicates, I’d also generate a number of suboptimal videos that I’d still like to keep, I reckoned that a couple of 2 Tb hard-drives ought to do it. When I went and asked one of the local university IT guys for some advice on reliable drive models, I got the classic half-weary, half-suspicious IT counter-question:
“Why do you need that much space?”
I explained to him that I was doing a high-throughput image screen, and I’d calculated exactly how much room I needed for the entire experiment.
“You can’t possibly need that much,” I was told sagely. “That’s more space than your entire institute is allocated on the backup server.”
Well, quite. When I saw this person on my return and told him about my successful results, a look of panic flashed over his face. “Well, you can’t back up your experiment here,” he said, as if I’d just proposed infecting him with a virulent strain of ebola virus.
So what’s a girl to do? The hard-drives are things of beauty: brushed steel, reassuringly solid, reputable brand name, almost silent (unlike their first-generation counterparts, which sounded as if they were about to ask Air Traffic Control for permission to taxi down the runway). But rumors abound of unexpected corruptions, inexplicable failures, soured sectors and rotten RAIDS. Will two copies be enough? And what about after publication? Do I save all the original tifs, or only the final avi video formats? Will any self-respecting journal want to lug around all this supplemental data for me in perpetuity?
Answers to the usual place.
I’d keep the TIFFs. You never know when some clever person is going to come along with a better set of image analysis tools.
I’d suggest compressing the TIFFs as much as possible (using lossless compression, of course) and do as you’re doing. Terabyte-sized USB hard drives are cheap. If you’re concerned about drive corruption, buy two sets.
Although stuffing these on a RAIDed, server-quality disk farm might seem like a safer idea, then they’re out of your control. Also, high-quality disks attached to network file servers tend to be very expensive per terabyte (with good reason; they’re not consumer-grade drives). If your institutional network bods have space, I’d store an additional backup copy there (and if they back the disks up to tape, so much the better; tapes are also very cheap as an archiving solution).
Can you also dump the data to some centralized repository like GEO? That might set your mind a little more at ease, knowing there’s a totally separate copy floating around. You might have to do this for publication anyway, no?
Just some thoughts. Be thankful you’re not doing next-generation sequencing, where single instrument runs generate terabytes of image data. Argh. Urgh.
P.S. You forgot Syquest removable cartridge hard drives. Seemed like super-technology at the time. ;)
Hi Jennifer, at last count I think our server space was up around the ~30TB mark for high content screen data, and we run a completely independent setup from the rest of the institute. As for how it all works, I’m still learning… but we have a very good engineer in our group who looks after it all, and happens to be a whiz at image segmentation as a bonus. We only keep current data (ie in analysis) in an easy to access format. Once it’s been crunched, the numerical data goes into a MySQL dBase on a server and the raw image data gets compressed and stored “offline”, which basically means a big stack of replaceable 1TB drives (not unlike the one in your picture).
If you have specific questions I’ll gladly pass them on to those that might know the answer.
Thanks, guys. I think I’m not allowed to dump these onto tape (see terrified IT bod, above), but if you think the drives are reliable enough, I’ll go for two. And I love the sound of off-site backups (anyone here remember that wonderful, now-extinct WoTC game, Netrunner?) – we definitely want to make all the data open access, but I assume journals et al. have space limits just like everyone else. I’ve not heard of GEO.
Print it out (on paper) in 3 columns: x, y, intensity.
Darren, I should have mentioned that our readout is not necessarily ideal for standard image analysis, as we’re looking for some rather highly textured, amorphous qualities. The tools to analyze some of our parameters barely exist, so I think it will be important, as Richard W mentioned, to keep all the original tifs.
How can you not know of the mighty Gene Expression Omnibus? It’s even MIAME compliant, whatever TF that means.
Print it out (on paper) in 3 columns: x, y, intensity.
And then, of course, if you need it electronically again, you can scan it in and use OCR to translate back to “digital” text. With hardly any transcription errors at all, oh no.
[certain aspects of the above statement might not be entirely accurate]
Ha, my first computer used tapes. As in, ordinary cassette tapes. They were the best.
Jennifer, we still keep the TIFFs, just not at our fingertips (so to speak). We use lossless compression and then stash them away. It became apparent within a very short time that the kind of phenotypes we were looking for were going to need custom analysis tools. The basic tools available are very limited in scope and we spend a lot of time developing analysis algorithms.
We have a student building a really nice Java app at the moment to access the images and do analyses without having to know how to drive CellProfiler, R, MySQL etc.
Tom, one of my lecturers in undergrad kept a huge pile of punch cards that he once used for data analysis in his PhD.
I used to have a nice collection of my PhD thesis data on 8 inch floppy disks (anyone else remember those?)
I remember loading computer games from cassette tape… they always seemed to crash with about 2 minutes to go (i.e. 25 minutes in, for the good games).
All my thesis work is on a zip disc, which seemed at the time as if it was going to be The Next Big Thing. I really ought to look into getting that data off the disc and onto a couple of DVDs and a hard drive. Not that I’ll ever need it, it would just be nice to have ;)
Darren, I like the sound of that java app. I hope it works well. In our lab, we’ve only just got around to buying our MatLab license and have yet to delve into the delights of advanced Cell profiler….
Jennifer, it’s being written in open source format, so I’ll let you know when it’s working… you still need someone to figure out the underlying analysis in CellProfiler and R though.
punch cards
Darren loses. That’s like the Godwin’s Law of computer hardware.
I"m getting flashbacks to the first Basic programme I wrote. I think it was 12 lines.
sudden pang of nostalgia for my MTX 512
There’s a big old box of tissues just over there.
Thanks
sniff
Ah, I remember 8-inch disks… I still have a few around the place. It’s hard, nowadays, to find equipment to run them on, but I found a compatible machine in my loft.

Has Heidi blanched in fear?
I feel a bit miffed I missed the eight inchers. Must have been just before my time. What year was this?
1907
I was backing up NMR spectra on the 8-in floppies from 1983-88 – they were the removable media of choice for an ageing Bruker 200 MHz NMR spectrometer (the other storage media were removable Winchester-type hard disks that were legendarily temperamental). By then, PCs (most reasonably wealthy labs had one, though never two) – had 5.25 in floppies or old HP tape drives. I don’t remember ever seeing a PC with a drive to read the 8 in floppies, though, even an external one like in Henry’s picture (where does the USB cable plug in?).
For a while in the 90s there was a rumour doing the rounds that 8 in floppies were becoming prized because they were needed for the computers on the Russian space stations. I assume that was an urban legend.
The 1980s NMR spectrometers had the Winchester drives for fast storage of a lot of data, but the disks tended to crash and take months of work with them (sound familiar?). Backing up to the 8 in floppies was a tedious job, though – used to have to wait 30 min or more to back up 1 day’s spectra. Anyway, I’ll stop now before I start sounding any more like the Monty Python Yorkshiremen sketch.
Zip disks, argh. I kept one, just to laugh at it whenever I see it.
Ah, that explains it. I didn’t get serious about computers until about 1989. Before that I did my uni homework on one of those swish typewriters that can store one entire line, so you could edit a little bit before committing. We still used loads of tipex, though. How on earth did Jane Austin write Pride & Prejudice in one draft, long-hand?) Then I got seduced by my roommate’s computer, which I think was running a programme called……….Word Star? Does that sound right?
I was introduced to ZX81s and persuaded my parents to invest in an MTX 512 around 1984. Tape recorders and dreams of Winchester drives. All the cool kids were getting into BBC Bs and there was never time on them at school so I lost interest. Then I wrote my project on a crappy Viglen (isn’t that redundant? Ed ) early 1991… and discovered the Macs upstairs.
Bought my own Mac (LC475) in 1994 to write my thesis and it was downhill from then.
where does the USB cable plug in?
You really don’t want to know.
My first computer was a 48k Sinclair Spectrum (because 48k had so much more than the 16k version). Even then, people said of the ZX81 that it was good for storing the names and addresses of your friends only if you were really unpopular.
I wrote my Ph. D. thesis on an Amstrad PCW8256. You’re fired.
Feh. Amstrad. That explains a lot.
This is all very sweet.
As you were.
Jenny – yes, WordStar. I had forgotten all about it. At one time I was a ChiWriter jockey, which was similar.
That was back in the days when you used markup codes in brackets to do formatting… does this remind anyone of html? Or Nature Network forum codes, perhaps?
Time traveling back to high school, I’ll see your cassettes and punch cards, and raise you a stringy-floppy. To work with a TRS-80 (“Trash-80”) model 1, or model 2, I believe. Thank goodness that’s over.
At home we had a Commodore PET, replaced by a PET-2000, then a VIC-20. Awesome, awesome computing power.
Oh yeah, digging the memory of my 40 MB hard drive. They don’t even make flash drives that small any more.
Winchester drives
I’d forgotten they were called that.
@Eva – even better than the original 100Mb Zip Disks were the updated 250Mb ones… and the 250Mb drives even came with a warning that read and write times to 100Mb disks would likely be a lot slower than on the [older] 100Mb drives. Now that’s progress.
P.S.
I’m getting flashbacks to the first Basic programme I wrote. I think it was 12 lines.
I can do better than that. Classic high school computer science class stuff:
10 PRINT “BUTTHEAD”
20 GOTO 10
For extra-specialness, include:
5 POKE xxxx,0
where xxxx is the number that disables the computer’s {BREAK} key
hahahah. You rebel!