Attending: Ewan, John B, John H, Matt D, Matt V, Winnie, Sam, Jens (chair+mins), David, Gareth, Robert, Brian, Elena, Lukasz


0. Operational blog posts

We had five blog posts from the past quarter; four by Jens and one by Brian -
and both Brian and Jens were subject to the Purdah during the first 1/3 of the
quarter.

There has been an operational issue reported: huge FTS backlog for transfers
GLA->RAL for ATLAS.  These are multicore jobs, ATLAS RDO data of about 5 GB
per file.  Previously we had 2-3 simultaneous transfers, now it is more like
20.  Curious mix of transfer speeds: one 5GB file may transfer in less than a
minute where some others are 100 times slower.  No correlation found, they can
even come from the same disk server.  GridFTP packet loss would lead to
dropping speed then attempting to ramp back up; there are no packets being
lost at Glasgow though.  RAL is seeing packet loss.  Could also be a JANET
intervention - there was one earlier which disrupted STFC's internal
communications (ie between Daresbury and RAL).  Perfsonar doesn't replicate
the performane patterns seen/required by GridFTP, but perfSonar data from GLA
to other sites in the UK are wildly different [so not just RAL?]

ATLAS workflow may also cause problems: how many transfers are actually
scheduled?  The bandwidth to RAL has improved but there is a huge backlog,
47000 transfers, 24000 jobs.  Glasgow still getting 5% of these curious slow
transfers.

** Brian will find or sketch a diagram of the RAL Tier 1 connections, the site
router to UKlight and JANET.


1. Summary /impressions of the digital preservation for HEP workshop at
CERN - ISO16363 for HEP?

Matthew Viljoen had attended the one week workshop on ISO16363, the PTAB high
level training course for managers of digital repositories, held at CERN two
weeks ago.

Jamie Shiers is currently leading the LHC data preservation activities.  There
is an agreement signed by many T1s, including all European ones except INFN
and STFC (ie RAL's organisation).  Aim is to converge on long term storage of
data.  Also aiming to build trust in each other as data centres, using ISO
standardisations.  Standard includes auditing and certification of
trustworthy digital repositories.  ISO std was developed by the PTAB in 2014
and describes the requirements rather than how to implement them; it also
includes requirements which can be tested/audited.  Builds on OAIS which is
ISO 14721.

Data is preserved not just at the bit level, but there must be descriptions of
how the data is to be interpreted.  "Can a five year old understand it" -
given that you often need a degree in physics to make sense of physics data,
perhaps that is quite a challenge.

Ewan asks if it is even worth it: the data centres focus on bit preservation
[as we do today and DPHEP requires today] whereas the knowledge of what's in
the data resides elsewhere - not in the data centre, but with the users (or
experts) - so the user experts have the archivist role.

There are lots of bespoke formats particularly in HEP: where others such as
the neutron source and climate modellers tend to adapt existing stuff like
HDF5 (which has standard libraries in Linux), physics has always tended to
make up its own formats (and protocols).

Other repositories - libraries, humanities - have been doing this for ages and
are probably a decade ahead of HEP.

However, in addition to the bespoke formats, HEP also has more complex
workflows, and generally also much larger data volumes.  [Humanities sometimes
find they have much more metadata than data, depening on discipline,
where HEP is clearly the other way around.]

Compare this to either compiling an old Fortran (excuse me, FORTRAN) 66 or 77
program, to compiling C++ today or in the future: often the language core is
standardised but there are lots of vendor extensions and other oddities built
into real code.

Compare this also to running your old DOS or Windows 95 games.  We can buy
them (paying for them again!) from vendors who specialise in adapting old (or
even current) games to the Windows-platform-du-jour (or even porting them to
Macs or Linux) - so we are essentially paying for the ability to keep playing
the old games.  Sometimes there are open source efforts as well when enough
people with programming skills get nostalgic about their wasted^W youth, but
these are generally reimplemented.  More generally, there could be a business
model in offering format shifting of data - ensuring that data is always
available in a usable format [and that is what Preservica is doing,
preservation in the cloud.]

Alternatively (and back to physics as well), we could try to preserve the
running environment in a VM, but then it will need a hosting environment
[there is a story somewhere about an emulator which was kept so it could run
another emulator which ran something interesting.  And from there of course it
is turtles all the way down as the chat log points out.]


2. Update on data of "small" or new VOs?  I think this should be a
semi-regular feature but it doesn't have to be long - just the $\Delta$
since last week.

Sam will contact Paul about "new DIRAC" which may be more sensible than "old
DIRAC" in knowing about SEs and stuff.  Jens also had a note about getting
some feedback from LIGO as a new user.  Of course if LIGO have other
priorities (like they've had recently) then it will take a bit longer.

The other DiRAC has had intermittent practical proxy problems.  Problems are
raised and resolved on the list so information is automatically archived
(searchably) which, so the theory goes, will help when the next sites join.


3. GridFTP to CEPH: report from meeting last week?

Ian Johnson at STFC had made good progress implementing a GridFTP interface to
CEPH.  Aligning the block sizes (in GridFTP with the ones in Rados striper)
unsurprisingly led to an improvment in speed - up to line speed, which is as
much as we can ask for.


4. Update on Ewan's proposal

Jens had read it and suggested adding (a) risk register, (b) proposed
timescale, (c) list of expertise and experts in GridPP that we could rely on,
and (d) costing.  Document was a PDF but contact Ewan for edit permissions on
the original which is sitting in the cloud somewhere.


5. AOB

Luke reported a new user community approaching him about use of his storage
system, and wanting performance numbers in particular - and how the storage
system would deal with streaming data.  At the Tier 1, CASTOR is generally
stress tested by submitting a number of jobs onto the CE (or at least to the
cluster) and have them probe the (test or preprod instance of) CASTOR.

Since Luke was interested in saturating the network link [that's the spirit!],
which is 10Gb at Bristol, the sensible thing to do would be to submit to
GridPP sites in the UK and have them read from or write to Bristol as
appropriate, and watch what happens.

The RAL code is probably using RFIO or xroot, but writing a few lines of
script which read a file from a known location is hardly rocket science.
Since GridPP sites are sensible and support dteam, it should be possible to
submit stuff to CEs around the UK and see what happens.  Performance data is
intended for a grant application, the user is storing hi-res scans of crystal
surfaces.


Brian reports that Matt had built a new WN on CVMFS with a modern version of
GFAL.  They will now test whether it resolves some of the issues reported
earlier.


There is a GDB today: http://indico.cern.ch/event/319749/


Ewan Mac Mahon: (08/07/2015 10:01:24)
Hullo.
Anything in the perfsonar tracerout logs?
Can you see if it's going different ways at different times?
If you can't see anything, of course, that doesn't mean nothing's happening.
Brian Davies @RAL-LCG2: (10:13 AM)
https://lcgfts3.gridpp.rl.ac.uk:8449/fts3/ftsmon/#/?vo=&source_se=srm:%2F%2Fsvr018.gla.scotgrid.ac.uk&dest_se=srm:%2F%2Fsrm-atlas.gridpp.rl.ac.uk&time_window=1
Lukasz Kreczko: (10:14 AM)
Would DMLite + HDFS make an interesting blog post?
Brian Davies @RAL-LCG2: (10:15 AM)
@Lukakasz, yes it would
Samuel Cadellin Skipsey: (10:15 AM)
Yes, Lukasz - I did ask if you wanted to give a presentation to this meeting about it a while back, and anything presentation worthy also is blog post worthy.
Lukasz Kreczko: (10:15 AM)
sure, can do both
Jens Jensen: (10:19 AM)
THere'd be jargon if we explained the grid, too...
Brian Davies @RAL-LCG2: (10:20 AM)
Other interesting plot for FTS transfers for GLA-RAL is ( for last 48 hours): http://dashb-atlas-ddm.cern.ch/ddm2/#activity=%28Data+Brokering,Data+Consolidation,Data+Export+Test,Debug,Deletion,Express,Functional+Test,Group+Subscriptions,Production,Production+Input,Production+Output,Recovery,Staging,T0+Export,T0+Tape,User+Subscriptions,default,rucio-integration,test,test%3AT0_T1+export,test%3AT1_T2+export,testactivity10,testactivity20,testactivity70%29&date.interval=2880&dst.cloud=%28%22UK%22%29&dst.site=%28%22RAL-LCG2%22%29&grouping.dst=%28cloud,site,token%29&grouping.src=%28cloud,site%29&m.content=%28d_dof,d_dot,d_eff,d_faf,d_plf,s_eff,s_err,s_suc,s_thr,t_eff,t_err,t_suc,t_thr%29&p.bin.size=h&p.grouping=activity&src.site=%28GLA%29&tab=transfer_plots
Gareth Douglas Roy: (10:23 AM)
Thats actually what is slightly worrying over all the transfers seem reasonably okay but we stil have 37K queued at the site.
Ewan Mac Mahon: (10:32 AM)
We do also have a lot more data than humanities tend to, which does make a difference.
Stashing VMs is certainly a common idea that comes up.
Samuel Cadellin Skipsey: (10:32 AM)
C++ iis both an ANSI and several ISO standards, IIRC
Ewan Mac Mahon: (10:33 AM)
Anyone want to bet that HEP C++ is standards compliant though.
Plus, there's only so far a standard gets you when you're looking at a standard library call to something you don't have any more.
This is where the 'stash a VM' idea comes from; if you make it self contained then all you need to do is be able to run x86_64 VMs.
And there's likely to be a lot of people interested in running x86-64 VMs.
Jens Jensen: (10:36 AM)
There is Wolfram's CDF: computable document format
Paige Winslowe Lacesso: (10:36 AM)
Sorry, must - !
Ewan Mac Mahon: (10:38 AM)
There was an interesting point in there about proprietary software too - does your licence even allow you to archive your stuff.
David Crooks: (10:38 AM)
Yep
Ewan Mac Mahon: (10:39 AM)
GOG started off entirely selling packages of dos games bundled with dosbox.
Jens Jensen: (10:39 AM)
Still do with some games. They run with 100% CPU. Only now it's a core :-)
E.g. Zork
Ewan Mac Mahon: (10:40 AM)
Ah, but so long as someone builds something in the future that allows us to run circa 2015 linux userspace code, we can just run dosbox on that, and the game on that.
Turtles /all/ the way down.
I think one of the other points in favour of 'stash a vm' is that it's quite 'rich'; you don't try to predict which things future data archeologists will find useful, you just throw everything at them and let them pick through it.
Gareth Douglas Roy: (10:42 AM)
Along the turtles line... funny if you haven't seen it 
https://www.destroyallsoftware.com/talks/the-birth-and-death-of-javascript
Ewan Mac Mahon: (10:45 AM)
So far my timescale is 'in time to be useful', which does indeed need some firming up.
I'm not sure that direct streaming sounds like the best possible approach. I think we usually suggest people stream to a local 'T0' right next to their kit and do batch transfers from there.
The other thing adds a lot of potential failure points that can effectively take your experiment offline.