Attn: Jens (chair+mins), John H, John B, Matt, Winnie, Daniel, Govind, Marcus, Gareth, David, Steve, Ewan, Sam

Apologies: Brian


0. Operational blog posts

   Reminder to people to blog when they do something Interesting(tm)

1. Review of (storage/data) issues for the WLCG workshop and the ATLAS
   Jamboree

   Nothing new/else suggested.  For ATLAS at least it would be worth
   following up on the catalogue testing (right format, right location,
   instrictions for all SEs? - probably everything is fine or close
   enough at this point) plus the "T2C testing"

2. And discussion of current issues such as T2C testing, ATLAS space
   tokens, non-LHC VOs and other loose ends - may be quick if there is no
   news. Maybe the state of the docs in the wiki.

   (These issues were raised during the round table discussion (q.v.))


3. I think we are overdue for a quick round table of storage and data
   related issues and interesting things. I'd quite like to do them
   slightly more frequently - if nothing else to give people who don't
   often say much a chance to say something - but we don't always get
   around to it.

John H: upgrade to DPM 1.8.10 on SL6 - probably taking a "big bang"
approach - get as much as possible ready, announce the downtime, do the
work and comeback up.  Intending to use puppet - but nontrivial - are
there updated instructions?  Instructions on the wiki are somewhat out
of date, but there is of course the upstream documentation (from DPM).
Using a smallish pool as testbed prior to the upgrade.

Marcus: needs to upgrade nodes anyway, so could take a look at
documentation.  Steve points out that we should just delete (well,
archive) it if it is out of date and no longer needed (ie all the
necessary docs are available from DPM); if we need our own supplementary
information then we need to update it.  Also looking at options for
extension of storage e.g. with iSCSI on SL6 or SL7.  Does anyone
have relevant experience?  Related to that, maybe it's time we had a
technology review

John B: same as John H: need a new head node in particular.  Using
CentOS7, then migrating.

Steve: same as John B.

Matt: Same as Johns H and B, and Steve.  New head node needed.
There is extra space available on uni cluster, which could perhaps
be made available via DPM.  Also interested in New and Exciting(tm)
stuff, e.g. SRM-less storage.  Could be volunteered to try out new
stuff... [that's the spirit!]

Winnie: Unstable storage [also discussed last week under op iss.]
Would be nice if Bristol's support could do some testing?

Daniel: Standalone xroot supporting multiple VOs on single security
configuration.  Aiming to turn into production, on Lustre.  Everyone else:
"Ooo! Interesting!" - turn into blog post?

Govind: struggling to drain old storage, as it is very slow - only 1/4
TB/day.  Tried the dmlite drain feature which managed a more respectable
2-3TB in a few hours.  Sam points out they need passwordless ssh from
the head nodes to the pool nodes in order to speed up the draining -
would be much faster than RFIO.  Problems on other nodes - dpmck -n
checks whole filesystem, not just selected nodes.  No way currently to
check only selected nodes.  Using -n (dry-run) without stopping DPM
would be OK but might show spurious problems which could be resolved
manually by inspecting timestamps, or by running the script again later
and comparing the outputs.  Ewan points out it is expected to run in ~<
1hr; the dry run would give an indication of the run time since the
scanning is the slow bit; the fixing would be relatively quick.

Ewan: DPM on SL7; currently have a single node install using puppet.
Amusingly the WLCG repo name was changed by CERN, so needs update.
Also pondered the /dpm path name which can be removed with a single
setting.  Aim to pare down config to a single disk pool, then integrate
it with DPM.  [It is worth documenting this sort of stuff, or sharing it,
since as we have seen with the other sites there is lots of interest in
doing DPM with puppet.]

Gareth: Discussions at Glasgow: getting new storage would mean an
opportunity to retire old hardware and install HDFS which should work
also with the older non-retired hardware.  dCache?!

Sam: mostly spent time preparing for the WLCG workshop - gathering
requirements from sites, feedback from UK and FR - and teaching...

David: same as Gareth...

Jens: working in Copious Spare Time(tm) on GLUE2 for CASTOR with Rob
Appleyard; lots of EUDAT stuff; occasional work on Indigo (think iRODS++)
when there is a moment; DiRAC.


As Jens is away next week, and it is the week of the WLCG workshop (vidyo
link already available!), we are expecting to cancel next week's meeting.
However, the meeting slot *is* available, and if people should feel the
need or desire to connect and have a chat, that is absolutely fine.


$ For all you NVM fans, you might be interested in having a look at the
   presentations (they're all online) from the recent SNIA NVM summit:

   http://www.snia.org/events/non-volatile-memory-nvm-summit

   Even if new technologies make you go "pshah!" (I know, unlikely), you
   will still find fascinating stuff like performance and benchmarking
   methodologies and musings on data efficiencies and words of wisdom in
   general.

$$. BTW. We're hiring!  If you want to be a castorologist or you know
   someone who might want to be a castorologist

   http://www.topcareer.jobs/Vacancy/irc215357_6221.aspx


John Bland: (27/01/2016 09:59:47)
good moaning
Matt Doidge: (09:59 AM)
I was just pissing by...
John Hill: (10:01 AM)
Sorry??
Matt Doidge: (10:01 AM)
Continuing on the the 'allo 'allo skit. John started it!
Ewan Mac Mahon: (10:13 AM)
That was a bit too breaky-uppy. I'm pretty sure I missed critical bits.
As a general principle, I have no idea why anyone would run iSCSI unless someone forced them though.
Daniel Peter Traynor: (10:19 AM)
ok
Samuel Cadellin Skipsey: (10:23 AM)
https://indico.cern.ch/event/432642/attachments/1201318/1748385/dpmdbck2015.pdf
Govind: (10:27 AM)
can you hear me..
looks like my sound gone..
Ewan Mac Mahon: (10:28 AM)
Yes, you can't hear us.
Anyway, I shall stop now.
I think I missed about the first 5mins.
John Bland: (10:34 AM)
our vague plan is (once we know what we're doing) install new pool nodes C7/puppet, migrate headnode to C7/puppet then maybe a rolling drain/reinstall on remaining pool nodes
this assumes all the different os/dpm versions interoperating
Ewan Mac Mahon: (10:35 AM)
I suspect that if anything, it might be a matter of configuring the new version to turn off the new stuff.
But that's total gueswork.
Of course, a pool node can offer clever services without bothering anyone if the head node simply ignores them (e.g. imagine they had web servers, but the head node never directs HTTP requests to them, you'd be fine)
What particularly concerns me is the new gridftp redirection magic.
Samuel Cadellin Skipsey: (10:37 AM)
Generally, yeah, new stuff won't work.
(Gridftp redirection, for example)
Some of the new things fall back to old mechanisms (less efficiently), and some just break stuff.
Ewan Mac Mahon: (10:38 AM)
The interesting point is whether the new pool nodes will repond to old style requests as directed to them by the old style head node.
But we'll see soon enough.
Samuel Cadellin Skipsey: (10:38 AM)
Gridftp redirection, IIRC, falls back to an inefficient mechanism which makes things slower.
Ewan Mac Mahon: (10:39 AM)
Slower than the new hotness, or slower than it used to be?
Samuel Cadellin Skipsey: (10:39 AM)
Slower than it used to be.
The fallback I'm aware of essentially picks a random gridftp server on a random pool node to use.
Although, I'm not actually sure if that fallback works for *pool node* compatibility.
Ewan Mac Mahon: (10:40 AM)
Hmm. Can an old pool node get a file from another pool node?
Do they rfio tunnel them?