Minutes of the storage EVO meeting, 08 Sep 2010 Present: Glasgow: David Edinburgh: Wahid Liverpool: John, Stephen Lancaster: Matt RHUL: Govind RAL: James, Jens (chair+mins) Apologies: RAL: Brian (on leave) *** Don't forget the tasklist on *** https://savannah.cern.ch/projects/srmsupportuk/ Discussion of the xfs problems (cf discussion on list) - Some people have moved to ext4 - Probably "just" a kernel issue? Modifying block readahead mitigates problem. Manchester also used the tuning Recommended for all sites to do this, and test. Test at Glasgow. Re ext4, Glasgow will also have new disk servers on ext4. Maybe using a staged deployment/upgrade approach. Sam will send list of files to Wahid for stress testing. Discussed in dteam - some sites on SL4, so we don't want gLite 3.1 support to go away just yet. At least can we keep support for DPM in gLite 3.1. Jeremy will follow up. GridPP followup Simon at IC working on testing StoRM with Lustre - action to check and cc Duncan. Not much response from experiments in our pre-questionnaire; maybe the time of year. OTOH, they seem to be happyish. Maybe more proactive followup useful, cf CAMONT at Lancaster. We should encourage others to request resources. PMB interested in other outreachy/knowledgeexchangey activities we do. Site issues Lancaster - Storage OK, except BDII failed. Type of component which is easier to reinstall than to fix. As for the DPM head node, plan to upgrade to SL5 "within six weeks" Liverpool - reinstall/upgrade within the next four weeks, before ATLAS reprocessing end of Oct. ATLAS keep filling DATADISK to 100% which makes the information publisher publish -0 with 1.7.2. ATLAS slow to respond. They place datasets which is a coarse-grained approach so can't back off easily as space fills up. Glasgow - gLite 3.2. New cluster being set up. Edinburgh - new disk servers for DPM rather than StoRM, GPFS licence somewhat pricy. Will upgrade DPM within next 6-8 weeks. Cluster will keep using GPFS. RHUL - running 1.7.2 on SL5. Is any site running 1.7.4? GGUS ticket on ATLAS user unable to access file - send to list. Most useful thing to send would be RFIO log (a representative snippet). Possibly related to hot file access? == CHAT == [09:58:41] David Crooks joined [09:58:43] Wahid Bhimji joined [09:59:13] John Bland joined [10:01:24] James Thorne joined [10:05:35] James Thorne We have the read ahead set to 16384 on all disk servers at the T1 [10:06:08] Govind Songara joined [10:07:50] Govind Songara left [10:07:58] Govind Songara joined [10:20:58] Matthew Doidge  User's echo suppression automatically activated [10:34:02] James Thorne left [10:36:16] John Bland left [10:36:19] David Crooks left [10:36:22] Govind Songara left [10:36:27] Matthew Doidge left [10:36:29] Wahid Bhimji left