Attending: Brian, David, Elena, Ewan, Gareth, Jens, John B, John H,
Matt, Rob, Robert, Sam, Steve, Tom

Apologies: Chris, Wahid

Other apologies: once again, Jens had to apologise for inadvertent
technical glitches; this time not Vidyo's fault but apparently a problem
with the speakers on the work laptop (or, more likely, speaker drivers).

0. Opdate on uperational plog bosts.

   Discussion of the CMS problem with dCache (cf mailing list): Daniela
   reports "it mainly shows up in hammercloud tests not being able to
   access a perfectly healthy file as dCache closes the connection on
   a xrootd "set monitor info" request."

   There is a GGUS ticket against it.

   Disks are failing at Manchester: is it just bad luck or something
   more fundamental?  When RAID arrays are rebuilt, load increases on
   the remining servers, thus increasing the chance that they will do
   something unhelpful as well.  Adaptec controllers, only 4-5 years old,
   so _should_ be working OK - out of warranty, though.

1. Feedback on Wahid's proposed slides for DPM workshop

   Not as ready for puppet as the slides might indicate?

   CASTOR had Wait IO states on disk servers,; seems to be the same
   problem as reported recently from Liverpool (albeit on CASTOR where
   L'pool is of course on DPM).  CASTOR, however, has a transfer manager
   which can set weights on transfers and limit the total to a specific
   limit (ie if GridFTP has more overhead on the server than xroot,
   weight is set higher for GridFTP).

   There is also the xroot throttling patch we have discussed over
   the past few weeks: some sort of solution to the problem seems to
   be required.

2. Can we summarise our "small VO" policy?

 - should use either "standard" WLCG middleware or DIRAC

   This seems sensible: DIRAC resources are only for testing so we should
   recommend the "normal" stuff.  What about non-SRM access?

   Standard LFC may not be the best recommendation as GFAL2 is "terrible"
   at talking to LFCs.

   Also sites may need to install stuff to support the VOs, but hopefully
   nothing much extra beyond the normal stuff?

 - should get and use space token

   Tokens-in-spaaace are meant to give them storage space but still to
   allow for management of resources when they grow.  Most SE
   implementations will do this also with path-based resource discovery
   but not DPM, so maybe that's another thing for the workshop.

   What about WAN access?  Ewan remembers that we previously discussed
   the number of sites they use and recommended that they use as few
   sites as possible.  E.g. 10TB at a site should not be too onerous.

 - should volunteer someone to join the storage group mailing list and,
   ideally, occasionally this meeting
 - data hygiene policies? Not having lots of empty directories? Does it
   matter?

3. GridFTP stuff. Curious about this redirect in DPM, and how does
   GridFTP work with IPv6?

   Redirect is apparently an optional feature to be supported by DPM
   (cf Wahid's link).  IPv6 seems to be supported so there must be some
   means (non-std?) for transferring (half) socket information - maybe
   the Documentation (aka source code) knows?

4. AOB


Chatlog from after Jens joined the second time


Jens Jensen: (08/10/2014 10:11:52) 

I can hear Brian 

Ewan Mac Mahon: (10:15 AM) 

Anyone want to bet ATLAS don't? 

Matt Doidge: (10:17 AM) 

software raid all round then? 

Ewan Mac Mahon: (10:17 AM) 

Ceph. 

Sam's writing a plugin. 

John Bland: (10:18 AM) 

they're not Seagate drives are they? 

Matt Doidge: (10:20 AM) 

Thye could just have been a duff batch 

Ewan Mac Mahon: (10:21 AM) 

So, for the sake of the saved copy of the chat, no they're not Seagates,
they're WD RE ones. 

Michael Adam James Huffman: (10:21 AM) 

Log message 

Hmm. I heard a rumour that if you sent a message, that magically retrieved the
chat history from before you joined, but that appears not to be true. 

Matt Doidge: (10:23 AM) 

Are the puppet modules presented in such a way that we can reverse engineer
how to do config stuff? (Like the yaim bash scripts). 

Ewan Mac Mahon: (10:23 AM) 

It's odd to have a duff batch of disks die in four/five years. Drives are
usually soon if they're bad, or when they're really old an knackered. 

I guess it's possible, but it's a bit unusual. 

Matt Doidge: (10:24 AM) 

It'd been a "not quite up to scratch" batch 

Steve Jones: (10:24 AM) 
In the end, al our disks and the data on them will go. I just hope it's after
I retire! 

Ewan Mac Mahon: (10:25 AM) 

Just remember the official slogan of Tier 2 disk systems: 

"It's non-custodial!" 

A tier 2 disk failure should be, and increasingly often actully is, a minor
increase in cache miss rate. 

An 'others' N2N library? 

Hmm. 

John Bland: (10:38 AM) 

gotta go 

Ewan Mac Mahon: (10:38 AM) 

Of course, you don't need a redirector if all your stuff's at one place. 

So, action on Sam to run a rucio instance then :-P 

Samuel Cadellin Skipsey: (10:40 AM) 

Oh, I expect DIRAC will win, but we *don't have a production instance of it
yet* 

As Daniela keeps pointing out, the Imperial DIRAC is a *test* instance. 

Ewan Mac Mahon: (10:45 AM) 

For now. 

And we don't have the greatest track record of getting VOs to test things
unless we just con them that they're in production.