Attending: Gareth, John H, Marcus, Jens (chair+mins), Winnie, Robert, Tom
Byrne, Tom W, Sam, Duncan, Daniel, Brian, Raul, Matt D, Pete, Steve

0. Only three blog posts this quarter

   No operational issues to report

1. Round Table

Gareth: not much to report, storage wise.


Duncan: data transfers for DIRAC.  Can submit to FTS from DIRAC.
Unfortunately the system was designed to use a "shifter's proxy" as opposed to
the user's proxy, so needs a rethink.

Also interested in T2Cs - while CPU efficiencies are normally the purview
of the dteam meeting (aka TB-SUPPORT), T2Cs could put additional load
on other sites - which is what we are aiming to test with the Oxford
Experiment.  Duncan noted that QMUL had been copying files over from
RHUL - a large number of files per job - where remote IO might be the
better option and thinks the overall CPU efficiencies will be a measure
also of the data transfer/access efficiencies.  Brian points out with
remote access latency becomes a factor.  And so does the increased load
on the T2Ds.  Hence the need to experiment: which will hopefully take
into account the file sizes, ability or lack of same to share files
between jobs, etc.

US sites "looking for EU involvement"

Action Sam to prod Ilya about the above.


John: checksum stuff seems to be sorted.  Aiming to sort out puppet config
issues but maybe not before Christmas as other things will have higher
priority.  Currently just the single pool on 1.8.10 using SL5.


Marcus: not much operationally; some hw old and may get round to procurement.
Test DPM available; also interested in distributed storage in general.


Winnie: asked whether other sites were running dmlite with HDFS.  In general
it should be possible (see chat) as long as a plugin is available, or one
could write a plugin of course - also general interest in one for CEPH.
Should be possible even to have different pools on different file systems.


Robert: two pool nodes down.  Trying puppet.  DPM 1.8.9.


Sam: looking at the ATLAS namespace dump problem.  Apparently the script
trawls the entire database every time a dump is created for an endpoint, and
it needs to create 19.  It takes an hour each time, so 19 hours for a trawl
that should only be done once.  Does not seem particularly efficient.  Also
DPM upgrade testing and testing the redirectors.


Matt D: pool nodes - installing by hand, will move to Ansible.  Going to
1.8.10.


Raul: no other news other than reported last week.  Can help with Ansible.


Steve: old hardware.  Should balance between CPU and disk be revisited?
Duncan mentioned the commitment is by T2 (not the site, so North, South,
London, Scot).  There is a spreadsheet; see Pete's links in chat.


Pete: old disk servers decommissioned, 80TB.


Daniel: planned swap over to Lustre, almost finished and scheduled with ATLAS
for next week.  Also discussing T2D with LHCb.  However, with SRM LHCb could
determine whether access was local but apparently not in SRM-less - would be
worth clarifying with LHCb (ie Raja) at the next opportunity.


Brian - working on the CASTOR namespace dumps [there are two flavours,
filenames only and also metadata such as length, timestamp, checksums] - empty
directories slows down the generation of dump.
Can take 3-4 hours for something like ATLASDATADISK.  Also tracking PRODDISK
removal at T2s, plus the WAN tuning and DiRAC.  And dteam on CASTOR [have
changed service class] and test S3 echo endpoint - which would be an
interesting topic for next Wednesday?


Jens: DiRAC technical discussions with Sam, Brian, Lydia; Leicester about to
start, too, so need updated instructions.  Probably need to start over with
Durham as we discussed last week.  Can a file split across tarball segments be
recovered?  Jens' tests (on DIRAC-USERS) said no; Sam will try to replicate.
A file wholly in a single segment can be extracted even if tar reports an
error.  Also liaising with Liverpool re Alloy 2 (which is now called Indigo
(no relation to the DataCloud); Alloy 1 was based on iRODS and Alloy 2 -
Indigo - on Cassandra)


Gareth Douglas Roy: (02/12/2015 10:10:14)
And Janet's problems yesterday
Brian Davies @ RAL: (10:11 AM)
so there was a report that direct io rates were inversely proportional to th e rtt of th edisytance 
the rtt of the distance
Samuel Cadellin Skipsey: (10:11 AM)
because it's seeky, and therefore latency limited, yeah
Paige Winslowe Lacesso: (10:21 AM)
Microphone no work :(
Duncan Rand: (10:22 AM)
ATLAS FAX CHEP: https://indico.cern.ch/event/214784/session/6/contribution/265/attachments/341115/476003/atlas-fax-chep-2013-v3.pdf
Paige Winslowe Lacesso: (10:24 AM)
One thing want to know; Bristol is AFAIK the only UK site running dmlite with hdfs backend; impression is other sites are interested in this but want to see "how it goes"; given that dmlite+hdfs is going pretty well (I think) at Bristol, are any other UK sites thinking to do same??
Jens Jensen: (10:24 AM)
Thanks, Duncan.
Thanks, Winnie.
Don't think anyone else is doing hdfs like that?
Paige Winslowe Lacesso: (10:28 AM)
Blow me down. Somehow my imression is that other UK sites were looking with interest at dmlite+hdfs. So much for impressions....
Daniel Peter Traynor: (10:28 AM)
everything is interesting
Samuel Cadellin Skipsey: (10:29 AM)
Winnie: the issue is making the change to an entirely different storage infrastructure layer.
Duncan Rand: (10:29 AM)
raul you You sound like you are in the Brazilian jungle
Samuel Cadellin Skipsey: (10:29 AM)
HDFS is certainly a better storage solution than the "DPM filesystem" approach, but the cost of moving is large.
raul: (10:30 AM)
Kind of. I was in the jungle last week.
Real Brazilian jungle
Samuel Cadellin Skipsey: (10:30 AM)
But we're definitely interested in how well dmlite + (proper distributed parallel storage solution) works.
Duncan Rand: (10:31 AM)
ceph
Samuel Cadellin Skipsey: (10:31 AM)
Or, indeed, ceph, but there's no official effort in DPM Core for supporting a ceph backend.
(which pains me, too)
Matt Doidge: (10:31 AM)
I thought dmlite was filesystem agnostic?
Samuel Cadellin Skipsey: (10:32 AM)
Well, it can support any filesystem/storage system, with an appropriate plugin.
there's a "generic VFS" plugin, but I have no idea how well it would work against ceph
Daniel Peter Traynor: (10:32 AM)
lustre?, could add a test dmlite infront of our lustre if that would be of interest
Peter Gronbech: (10:32 AM)
The Mou from August this year which is near enough is here:https://archive.gridpp.ac.uk/deployment/status/reports/reports.html
https://archive.gridpp.ac.uk/deployment/status/reports/reports.html
Samuel Cadellin Skipsey: (10:33 AM)
Dan: sure, lustre should work. It's quite posixy, so I guess the standard plugins would work with it.
(The hdfs plugin understands hdfs, on the level of "bits of this file are on nodes X,Y,Z, so I can transfer the file optimally from services running on X,Y or Z.")
Daniel Peter Traynor: (10:34 AM)
would need a non puppet config method 
is there some docs/web/wiki on dmlite
Samuel Cadellin Skipsey: (10:36 AM)
Yeah, one mo.
https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Dmlite is the root of the docs