Minutes of the storage EVO meeting, 08 Sep 2010

Present:
	Glasgow: David
	Edinburgh: Wahid
	Liverpool: John, Stephen
	Lancaster: Matt
	RHUL: Govind
	RAL: James, Jens (chair+mins)

Apologies:
	RAL: Brian (on leave)


*** Don't forget the tasklist on
*** https://savannah.cern.ch/projects/srmsupportuk/

Discussion of the xfs problems (cf discussion on list)

- Some people have moved to ext4
- Probably "just" a kernel issue?
  Modifying block readahead mitigates problem.
  Manchester also used the tuning
  Recommended for all sites to do this, and test.  Test at Glasgow.

Re ext4, Glasgow will also have new disk servers on ext4.  Maybe using
a staged deployment/upgrade approach.  Sam will send list of files to
Wahid for stress testing.

Discussed in dteam - some sites on SL4, so we don't want gLite 3.1
support to go away just yet.  At least can we keep support for DPM in
gLite 3.1.  Jeremy will follow up.


GridPP followup

Simon at IC working on testing StoRM with Lustre - action to check and
cc Duncan.

Not much response from experiments in our pre-questionnaire; maybe the
time of year.  OTOH, they seem to be happyish.  Maybe more proactive
followup useful, cf CAMONT at Lancaster.  We should encourage others
to request resources.

PMB interested in other outreachy/knowledgeexchangey activities we
do.


Site issues

Lancaster - Storage OK, except BDII failed.  Type of component which
is easier to reinstall than to fix.  As for the DPM head node, plan to
upgrade to SL5 "within six weeks"

Liverpool - reinstall/upgrade within the next four weeks, before ATLAS
reprocessing end of Oct.  ATLAS keep filling DATADISK to 100% which
makes the information publisher publish -0 with 1.7.2.  ATLAS slow to
respond.  They place datasets which is a coarse-grained approach so
can't back off easily as space fills up.

Glasgow - gLite 3.2.  New cluster being set up.

Edinburgh - new disk servers for DPM rather than StoRM, GPFS licence
somewhat pricy.  Will upgrade DPM within next 6-8 weeks.  Cluster will
keep using GPFS.

RHUL - running 1.7.2 on SL5.  Is any site running 1.7.4?
       GGUS ticket on ATLAS user unable to access file - send to list.
       Most useful thing to send would be RFIO log (a representative
       snippet).  Possibly related to hot file access?


== CHAT ==


[09:58:41] David Crooks joined
[09:58:43] Wahid Bhimji joined
[09:59:13] John Bland joined
[10:01:24] James Thorne joined
[10:05:35] James Thorne We have the read ahead set to 16384 on all disk servers at the T1
[10:06:08] Govind Songara joined
[10:07:50] Govind Songara left
[10:07:58] Govind Songara joined
[10:20:58] Matthew Doidge   User's echo suppression automatically activated
[10:34:02] James Thorne left
[10:36:16] John Bland left
[10:36:19] David Crooks left
[10:36:22] Govind Songara left
[10:36:27] Matthew Doidge left
[10:36:29] Wahid Bhimji left