Writeup of GridPP storage EVO meeting, 04 Jan 2012 Attending: Edinburgh: Wahid Lancaster: Matt Oxford: Ewan RAL: Jens QMUL: Chris RHUL: Govind 0. Quarterly report stuff - yes, it's time! 1. Review of services over Xmas break - did Santa visit your servers? Matt: T2K transferred lots of data shortly before Christmas. Tried to drain servers but DPM drain was broken, easily fixed. Ewan: may have had a T2K surge but generally everything went fine, with about 1 TB of free space (may shrink some space tokens to free up some more, or get a decommissioned disk server recommissioned). Planning to do SL5 upgrades, still. Chris - hammered by ATLAS, 100TB to GROUPDISK, virtual machine with BDII died just before Christmas but over the break everything went fine. Looking for files where checksums don't match; so far found none but found some where checksums are absent, about 4,000. Got a CASTOR tool to check checksums and made some minor modifications - under Apache licence, so could send to the list. May be generically useful. Can find files from yesterday and checksum those. Govind - not many issues but just before Christmas DPM headnode mirror disk failed twice in a week. Updated to DPM 1.8.3. Taking advantage of new feature to fill disk pools evenly. Wahid - ECDF seemed to work fine; don't know about other ScotGrid sites. 2. T2K use of GridPP They have enough capacity to fill site but are maybe not large enough to dedicate staff to chat with us. Like the discussion re space token. People who store serious data should be using space tokens rather than assume they have inifinite capacity - but the non-space-tokened storage is fairly small. Matt suggests that ops tests use space tokens as well. There is an argument that it may be better to reserve things for them, with the caveat that they may then be testing stuff that works when they should be discovering failures. 3. Lustre training? Lustre course available, expected to be good - quite expensive, though, even with academic discount. If there is significant demand for a course in GridPP, perhaps we could arrange a separate course. Would be difficult for GridPP to pay large sums for training, but maybe the sites can pay...? Can be tied into buying the cluster, because they train you in learning how to use the cluster with Lustre. Or do we have enough experience in-house (ie in GridPP), so to speak? Maybe the UK Lustre working group should be more active - good idea but nothing happens. 4. More hardware stuff...? Procurement, technical talks, NGS? Dell have new switches... should we cover this in the group, exceptionally? Much was covered at the combined hepsysman and ops meeting - may need to revisit when the kit comes in. How to select which switch is right, and how to deploy it? ECDF is not buying Dell, but IBM, to fit in with existing equipment. Also worth considering the types of cables you can connect into the switch. 5. Data management and data monitoring and accounting records Discussed above in connection with the T2K discussion and the QR - would be useful to improve monitoring if we could. 6. Wiki cleanup 7. AOB [10:01:13] Wahid Bhimji joined [10:04:29] Ewan Mac Mahon joined [10:05:34] Matthew Doidge joined [10:13:35] Christopher Walker joined [10:14:22] Govind Songara joined [10:37:33] Wahid Bhimji IBM BNT RackSwitch G8264R [10:41:15] Wahid Bhimji ok thanks