Attending: Brian, David, Elena, Ewan, Gareth, Jens, John B, John H, Matt, Rob, Robert, Sam, Steve, Tom Apologies: Chris, Wahid Other apologies: once again, Jens had to apologise for inadvertent technical glitches; this time not Vidyo's fault but apparently a problem with the speakers on the work laptop (or, more likely, speaker drivers). 0. Opdate on uperational plog bosts. Discussion of the CMS problem with dCache (cf mailing list): Daniela reports "it mainly shows up in hammercloud tests not being able to access a perfectly healthy file as dCache closes the connection on a xrootd "set monitor info" request." There is a GGUS ticket against it. Disks are failing at Manchester: is it just bad luck or something more fundamental? When RAID arrays are rebuilt, load increases on the remining servers, thus increasing the chance that they will do something unhelpful as well. Adaptec controllers, only 4-5 years old, so _should_ be working OK - out of warranty, though. 1. Feedback on Wahid's proposed slides for DPM workshop Not as ready for puppet as the slides might indicate? CASTOR had Wait IO states on disk servers,; seems to be the same problem as reported recently from Liverpool (albeit on CASTOR where L'pool is of course on DPM). CASTOR, however, has a transfer manager which can set weights on transfers and limit the total to a specific limit (ie if GridFTP has more overhead on the server than xroot, weight is set higher for GridFTP). There is also the xroot throttling patch we have discussed over the past few weeks: some sort of solution to the problem seems to be required. 2. Can we summarise our "small VO" policy? - should use either "standard" WLCG middleware or DIRAC This seems sensible: DIRAC resources are only for testing so we should recommend the "normal" stuff. What about non-SRM access? Standard LFC may not be the best recommendation as GFAL2 is "terrible" at talking to LFCs. Also sites may need to install stuff to support the VOs, but hopefully nothing much extra beyond the normal stuff? - should get and use space token Tokens-in-spaaace are meant to give them storage space but still to allow for management of resources when they grow. Most SE implementations will do this also with path-based resource discovery but not DPM, so maybe that's another thing for the workshop. What about WAN access? Ewan remembers that we previously discussed the number of sites they use and recommended that they use as few sites as possible. E.g. 10TB at a site should not be too onerous. - should volunteer someone to join the storage group mailing list and, ideally, occasionally this meeting - data hygiene policies? Not having lots of empty directories? Does it matter? 3. GridFTP stuff. Curious about this redirect in DPM, and how does GridFTP work with IPv6? Redirect is apparently an optional feature to be supported by DPM (cf Wahid's link). IPv6 seems to be supported so there must be some means (non-std?) for transferring (half) socket information - maybe the Documentation (aka source code) knows? 4. AOB Chatlog from after Jens joined the second time Jens Jensen: (08/10/2014 10:11:52) I can hear Brian Ewan Mac Mahon: (10:15 AM) Anyone want to bet ATLAS don't? Matt Doidge: (10:17 AM) software raid all round then? Ewan Mac Mahon: (10:17 AM) Ceph. Sam's writing a plugin. John Bland: (10:18 AM) they're not Seagate drives are they? Matt Doidge: (10:20 AM) Thye could just have been a duff batch Ewan Mac Mahon: (10:21 AM) So, for the sake of the saved copy of the chat, no they're not Seagates, they're WD RE ones. Michael Adam James Huffman: (10:21 AM) Log message Hmm. I heard a rumour that if you sent a message, that magically retrieved the chat history from before you joined, but that appears not to be true. Matt Doidge: (10:23 AM) Are the puppet modules presented in such a way that we can reverse engineer how to do config stuff? (Like the yaim bash scripts). Ewan Mac Mahon: (10:23 AM) It's odd to have a duff batch of disks die in four/five years. Drives are usually soon if they're bad, or when they're really old an knackered. I guess it's possible, but it's a bit unusual. Matt Doidge: (10:24 AM) It'd been a "not quite up to scratch" batch Steve Jones: (10:24 AM) In the end, al our disks and the data on them will go. I just hope it's after I retire! Ewan Mac Mahon: (10:25 AM) Just remember the official slogan of Tier 2 disk systems: "It's non-custodial!" A tier 2 disk failure should be, and increasingly often actully is, a minor increase in cache miss rate. An 'others' N2N library? Hmm. John Bland: (10:38 AM) gotta go Ewan Mac Mahon: (10:38 AM) Of course, you don't need a redirector if all your stuff's at one place. So, action on Sam to run a rucio instance then :-P Samuel Cadellin Skipsey: (10:40 AM) Oh, I expect DIRAC will win, but we *don't have a production instance of it yet* As Daniela keeps pointing out, the Imperial DIRAC is a *test* instance. Ewan Mac Mahon: (10:45 AM) For now. And we don't have the greatest track record of getting VOs to test things unless we just con them that they're in production.