Minutes of the storage EVO meeting, 28 July 2010 Present: Glasgow: Sam Birmingham: Chris C Edinburgh: Wahid Liverpool: John, Stephen Bristol: Winnie Sheffield: Elena QMUL: Chris W Oxford: Ewan, Pete Lancaster: Matt, Peter RAL: Brian, Jens (chair+mins) 0. Review of actions (see below) See below... 0.1. Review of tasklist https://savannah.cern.ch/projects/srmsupportuk/ (click tasks) Main question is whether the priorities are right. Is this the best way to maintain them? ... and the buglist is quite old, do we need it? These are (probably) the highest priority ones for the next quarter: * ext4 evaluation (comparing to xfs) * finishing the plan for dealing with orphaned files (see item 3) * dCache evaluation 0.2. Don't forget the blog! So far only one item for July! 1. Review of agenda for GridPP in Ambleside: http://www.gridpp.ac.uk/gridpp25/programme.html ... storage items and related! ACTION on Sam to collar Tony about section 2 which is about storage and data management (or specifically, about LCG s.a.d.m.) 2. Status of testing DPM - cf action 399 - ongoing, see report from prev. week. Should probably be a task rather than an action. dCache - Brian - talked to Rob about getting resources. Pondering whether they should be virtual or not. StoRM - Chris? See below (action) 3. Dark data discussion http://www.gridpp.ac.uk/wiki/Dark_Data_clearance We discussed the syncat for StoRM: how is a list of files generated? What should the next steps be? * We should volunteer a small VO with not too many files, maybe one which is not just local. Some use lcg-cr without depending much on the catalogue. ACTION Jens to talk to T2K to volunteer them. 4. Round table - storage related issues, things of interest ... not enough time for this :-( 5. AOB Brian raised the issue of brownout and TCP tuning. How do you know when somthing is wrong? When you have a hundred RFIO processes, tons of IOWAITs, and your transfer rate is a tenth of the capacity. It is thought the problem can be alleviated by resetting the TCP window size - the DPM YAIM install sets it to 32x the default, and this may be causing the problem. How does this relate to the RFIO buffer size? Relation not yet understood. The RFIO buffer size depended on the type of data being accessed - cf the study of access patterns - but this "brownout" behaviour is seen even for streaming data. Volunteer site to test out TCP buffer size - could volunteer Glasgow but will the test be influenced by other stuff. Otherwise wait for another site to get in trouble... Chris W has an tender for compute and would be interested in experiences from other sites. ACTIONS 391 05/05/2010 Investigate StoRM metrics for QMUL Chris+et al Med Open Done, closed. When changing space StoRM "forgets" about used space, but it can be reset manually with du. This publishing seems broken with 1.5 on Lustre but it obviously works at CNAF with GPFS. There is a specific call in GPFS which Lustre doesn't have. File access permissions seem to work in 1.5. However, an SL5 version is needed, it was due Really Soon Now(tm). Can upgrade now if experiments really want it, but would prefer to wait for SL5 version. 399 26/05/2010 Test DPM 1.7.4-6 on SL5 and report Sam Med Open Done, closed. Main problem was the VOMS problem (see minutes from 21.07.10). Also, it was 1.7.4-7. 400 26/05/2010 Investigate upgrading B'ham to SL5 Brian+Chris C Med Open Done, closed. Currently only ALICE files left on the head node; perhaps it's best to ask ALICE if they really need them and/or if they won't mind putting them somewhere else temporarily. In general, it is probably not ideal to have data storage on the head node. 401 02/06/2010 Clean up the wiki ALL Low Open Ongoing... what is the best way to do this? NGS maintain a spreadsheet (as a Google doc) with a list of who owns which web page, but in a wiki you can read the history... Ewan suggests using the comments page. == CHAT == [09:55:20] Sam Skipsey brb, getting coffee [09:58:17] John Bland joined [09:59:03] Stephen Jones joined [09:59:13] Ewan Mac Mahon joined [09:59:27] Ewan Mac Mahon Morning. [09:59:38] Chris Curtis joined [10:00:03] Brian Davies joined [10:00:11] Queen Mary, U London London, U.K. joined [10:00:23] Jens Jensen http://storage.esc.rl.ac.uk/weekly/ [10:00:45] Wahid Bhimji joined [10:01:07] Brian Davies Would like to add tcp tuning for DPM and brownouts to AOB [10:02:51] Winnie Lacesso joined [10:05:16] Pete Gronbech joined [10:06:13] Elena Korolkova joined [10:09:52] Jens Jensen https://savannah.cern.ch/projects/srmsupportuk/ [10:11:16] Queen Mary, U London London, U.K. i do hope that link is in the wiki... [10:11:57] Jens Jensen Can add to the wiki [10:14:31] Matthew Doidge joined [10:15:03] Queen Mary, U London London, U.K. cd /mnt/lustre_0/storm_3 ; lfs find -t file . | sed %^.%srm://se03.esc.qmul.ac.uk% [10:15:16] Queen Mary, U London London, U.K. Or something like that will produce a list of surls [10:16:59] Jens Jensen http://www.gridpp.ac.uk/gridpp25/programme.html [10:17:50] Peter Love joined [10:25:45] Jens Jensen http://www.gridpp.ac.uk/wiki/Dark_Data_clearance [10:27:40] Pete Gronbech left [10:37:41] Queen Mary, U London London, U.K. Congrats [10:38:09] Winnie Lacesso We want blog pix! [10:38:42] Elena Korolkova left [10:38:45] Sam Skipsey left [10:38:46] John Bland left [10:38:48] Wahid Bhimji left [10:38:49] Ewan Mac Mahon left [10:38:50] Brian Davies i dont' do CPU!!! [10:38:50] Winnie Lacesso left [10:38:56] Stephen Jones left [10:38:57] Brian Davies left [10:38:58] Chris Curtis left [10:39:01] Peter Love left [10:39:02] Matthew Doidge left [10:39:09] Queen Mary, U London London, U.K. left