Minutes of the storage EVO meeting, 11 Aug 2010 Present: Glasgow: Sam, David Edinburgh: Wahid Liverpool: John, Stephen Bristol: Winnie QMUL: Chris W Lancaster: Matt Manchester: Alessandra RAL: Brian, James, Jens (chair+mins) *** Don't forget the tasklist on *** https://savannah.cern.ch/projects/srmsupportuk/ 0. We probably need a semi-regular tasklet on wiki updates Jens updated front page, removing obsolete mailing lists. Sam will update the information about the DPM toolkit. 1. Progress with T2K syncat exercise at Lancaster - Sam+Matt plus others The checker has been updated. Chris is debugging the StoRM version at QMUL. Brian has also obtained a dump from the LFC via the DBAs, essentially CSV of guid, size, SURLs. Got 3E5 files, but not sure whether they all belong to T2K, and how many at Lancaster. ATLAS have a database->syncat tool, which is probably generic enough to be useful in a wide context. Syncat allows for guid, filesize, checksum. Could check ACLs but physics don't care too much. Plan needs fleshing out a bit now - Jens 2. ext4 vs xfs revisited (again) - cf Alessandra's find on the list Sam has a DPM with ext4 disk servers in production, will have more when some hardware problems have been resolved. How is it supported in the SL5 kernel? Via extra page (ext4-progs) but stable. Not clear if filesystems can be dumped yet, but this is a problem for normal machines, not an issue for a DPM disk server. Wahid is testing the 64 bit support: current ext4 filesystems are limited to 16 TB because they cannot create enough inodes. Ted T'so has written a version for 64 bit but it is highly experimental at this stage and seems to have unresolved compilation dependencies against some kernels (as expected for experimental code). 3. Detecting and coping with uneven load (unbalanced datasets) on disk servers - progress, recommendations? Sam's tool does the check and calculates stats for the file distribution. Can release to others if they're interested, although it needs some code tidying and some more work. It assumes an ATLASy setup where filesets are in directories. DPM doesn't quite round robin, at least not when other transfers are going on at the same time. In fact under some circs it may even rr to a full disk server. And it does not fail gracefully when transfer into a disk server fails - in the sense that then this particular file transfer will fail. Also, when new disk servers are added to a pool, they will of course (hopefully) be empty initially. Ideally, a tool should be able to rebalance the files, eg by calling out to a tool to transfer individual files to other disk servers (in the same pool). dCache and CASTOR are different because they have more complex placement algorithms; dCache probably does the Right Thing(tm) by default. 4. Experiment liaison revisited - questions to ask at Ambleside Particularly questions for the non-LHC experiemnts who are normally less well represented. Do they even know we exist and are able and willing to help them? They may be interfacing to us via Jeremy. * Which services do they expect from us (or would they find useful, eg syncatting, integrity checking, etc) * Any future changes in data management, like is currently being discussed by the WLCG experiments? 5. Progress on dCache/Hadoop tests/writeups To be revisited next week. 6. AOB NOB == CHAT == [10:00:19] Stephen Jones joined [10:01:46] Brian Davies joined [10:01:58] Wahid Bhimji i can do some [10:02:03] Wahid Bhimji exactly [10:02:25] Wahid Bhimji I was stalled trying think of suitable puns [10:02:58] Jens Jensen http://www.gridpp.ac.uk/wiki/Dark_Data_clearance [10:03:33] Christopher Walker joined [10:07:42] Winnie Lacesso joined [10:11:38] Matthew Doidge joined [10:35:14] Wahid Bhimji thanks bye [10:35:15] John Bland left [10:35:16] James Thorne left [10:35:16] Wahid Bhimji left [10:35:18] Matthew Doidge left [10:35:20] Winnie Lacesso left [10:35:23] Brian Davies left [10:35:24] David Crooks left [10:35:27] Sam Skipsey left [10:35:30] Alessandra Forti left [10:35:33] Christopher Walker left