Writeup of GridPP storage EVO meeting, 04 Jan 2012

Attending:
	Edinburgh: Wahid
	Lancaster: Matt
	Oxford: Ewan
	RAL: Jens
	QMUL: Chris
	RHUL: Govind

0. Quarterly report stuff - yes, it's time!

   
1. Review of services over Xmas break - did Santa visit your servers?

   Matt: T2K transferred lots of data shortly before Christmas.  Tried
   to drain servers but DPM drain was broken, easily fixed.  

   Ewan: may have had a T2K surge but generally everything went fine,
   with about 1 TB of free space (may shrink some space tokens to free
   up some more, or get a decommissioned disk server recommissioned).
   Planning to do SL5 upgrades, still.

   Chris - hammered by ATLAS, 100TB to GROUPDISK, virtual machine with
   BDII died just before Christmas but over the break everything went
   fine.  Looking for files where checksums don't match; so far found none
   but found some where checksums are absent, about 4,000.  Got a CASTOR
   tool to check checksums and made some minor modifications - under
   Apache licence, so could send to the list.  May be generically useful.
   Can find files from yesterday and checksum those.

   Govind - not many issues but just before Christmas DPM headnode mirror
   disk failed twice in a week.  Updated to DPM 1.8.3.  Taking advantage
   of new feature to fill disk pools evenly.

   Wahid - ECDF seemed to work fine; don't know about other ScotGrid
   sites.

2. T2K use of GridPP

   They have enough capacity to fill site but are maybe not large enough
   to dedicate staff to chat with us.  Like the discussion re space token.
   People who store serious data should be using space tokens rather than
   assume they have inifinite capacity - but the non-space-tokened storage
   is fairly small.

   Matt suggests that ops tests use space tokens as well.  There is an
   argument that it may be better to reserve things for them, with the
   caveat that they may then be testing stuff that works when they should
   be discovering failures.

3. Lustre training?

   Lustre course available, expected to be good - quite expensive,
   though, even with academic discount.  If there is significant demand
   for a course in GridPP, perhaps we could arrange a separate course.
   Would be difficult for GridPP to pay large sums for training, but
   maybe the sites can pay...?  Can be tied into buying the cluster,
   because they train you in learning how to use the cluster with Lustre.

   Or do we have enough experience in-house (ie in GridPP), so to speak?
   Maybe the UK Lustre working group should be more active - good idea
   but nothing happens.

4. More hardware stuff...?  Procurement, technical talks, NGS?

   Dell have new switches... should we cover this in the group,
   exceptionally?  Much was covered at the combined hepsysman and
   ops meeting - may need to revisit when the kit comes in.
   How to select which switch is right, and how to deploy it?

   ECDF is not buying Dell, but IBM, to fit in with existing
   equipment.  Also worth considering the types of cables you
   can connect into the switch.

5. Data management and data monitoring and accounting records

   Discussed above in connection with the T2K discussion and the QR
   - would be useful to improve monitoring if we could.

6. Wiki cleanup

7. AOB


[10:01:13] Wahid Bhimji joined
[10:04:29] Ewan Mac Mahon joined
[10:05:34] Matthew Doidge joined
[10:13:35] Christopher Walker joined
[10:14:22] Govind Songara joined
[10:37:33] Wahid Bhimji IBM BNT RackSwitch G8264R
[10:41:15] Wahid Bhimji ok thanks