Attending: Ewan, John B, John H, Matt D, Matt V, Winnie, Sam, Jens (chair+mins), David, Gareth, Robert, Brian, Elena, Lukasz 0. Operational blog posts We had five blog posts from the past quarter; four by Jens and one by Brian - and both Brian and Jens were subject to the Purdah during the first 1/3 of the quarter. There has been an operational issue reported: huge FTS backlog for transfers GLA->RAL for ATLAS. These are multicore jobs, ATLAS RDO data of about 5 GB per file. Previously we had 2-3 simultaneous transfers, now it is more like 20. Curious mix of transfer speeds: one 5GB file may transfer in less than a minute where some others are 100 times slower. No correlation found, they can even come from the same disk server. GridFTP packet loss would lead to dropping speed then attempting to ramp back up; there are no packets being lost at Glasgow though. RAL is seeing packet loss. Could also be a JANET intervention - there was one earlier which disrupted STFC's internal communications (ie between Daresbury and RAL). Perfsonar doesn't replicate the performane patterns seen/required by GridFTP, but perfSonar data from GLA to other sites in the UK are wildly different [so not just RAL?] ATLAS workflow may also cause problems: how many transfers are actually scheduled? The bandwidth to RAL has improved but there is a huge backlog, 47000 transfers, 24000 jobs. Glasgow still getting 5% of these curious slow transfers. ** Brian will find or sketch a diagram of the RAL Tier 1 connections, the site router to UKlight and JANET. 1. Summary /impressions of the digital preservation for HEP workshop at CERN - ISO16363 for HEP? Matthew Viljoen had attended the one week workshop on ISO16363, the PTAB high level training course for managers of digital repositories, held at CERN two weeks ago. Jamie Shiers is currently leading the LHC data preservation activities. There is an agreement signed by many T1s, including all European ones except INFN and STFC (ie RAL's organisation). Aim is to converge on long term storage of data. Also aiming to build trust in each other as data centres, using ISO standardisations. Standard includes auditing and certification of trustworthy digital repositories. ISO std was developed by the PTAB in 2014 and describes the requirements rather than how to implement them; it also includes requirements which can be tested/audited. Builds on OAIS which is ISO 14721. Data is preserved not just at the bit level, but there must be descriptions of how the data is to be interpreted. "Can a five year old understand it" - given that you often need a degree in physics to make sense of physics data, perhaps that is quite a challenge. Ewan asks if it is even worth it: the data centres focus on bit preservation [as we do today and DPHEP requires today] whereas the knowledge of what's in the data resides elsewhere - not in the data centre, but with the users (or experts) - so the user experts have the archivist role. There are lots of bespoke formats particularly in HEP: where others such as the neutron source and climate modellers tend to adapt existing stuff like HDF5 (which has standard libraries in Linux), physics has always tended to make up its own formats (and protocols). Other repositories - libraries, humanities - have been doing this for ages and are probably a decade ahead of HEP. However, in addition to the bespoke formats, HEP also has more complex workflows, and generally also much larger data volumes. [Humanities sometimes find they have much more metadata than data, depening on discipline, where HEP is clearly the other way around.] Compare this to either compiling an old Fortran (excuse me, FORTRAN) 66 or 77 program, to compiling C++ today or in the future: often the language core is standardised but there are lots of vendor extensions and other oddities built into real code. Compare this also to running your old DOS or Windows 95 games. We can buy them (paying for them again!) from vendors who specialise in adapting old (or even current) games to the Windows-platform-du-jour (or even porting them to Macs or Linux) - so we are essentially paying for the ability to keep playing the old games. Sometimes there are open source efforts as well when enough people with programming skills get nostalgic about their wasted^W youth, but these are generally reimplemented. More generally, there could be a business model in offering format shifting of data - ensuring that data is always available in a usable format [and that is what Preservica is doing, preservation in the cloud.] Alternatively (and back to physics as well), we could try to preserve the running environment in a VM, but then it will need a hosting environment [there is a story somewhere about an emulator which was kept so it could run another emulator which ran something interesting. And from there of course it is turtles all the way down as the chat log points out.] 2. Update on data of "small" or new VOs? I think this should be a semi-regular feature but it doesn't have to be long - just the $\Delta$ since last week. Sam will contact Paul about "new DIRAC" which may be more sensible than "old DIRAC" in knowing about SEs and stuff. Jens also had a note about getting some feedback from LIGO as a new user. Of course if LIGO have other priorities (like they've had recently) then it will take a bit longer. The other DiRAC has had intermittent practical proxy problems. Problems are raised and resolved on the list so information is automatically archived (searchably) which, so the theory goes, will help when the next sites join. 3. GridFTP to CEPH: report from meeting last week? Ian Johnson at STFC had made good progress implementing a GridFTP interface to CEPH. Aligning the block sizes (in GridFTP with the ones in Rados striper) unsurprisingly led to an improvment in speed - up to line speed, which is as much as we can ask for. 4. Update on Ewan's proposal Jens had read it and suggested adding (a) risk register, (b) proposed timescale, (c) list of expertise and experts in GridPP that we could rely on, and (d) costing. Document was a PDF but contact Ewan for edit permissions on the original which is sitting in the cloud somewhere. 5. AOB Luke reported a new user community approaching him about use of his storage system, and wanting performance numbers in particular - and how the storage system would deal with streaming data. At the Tier 1, CASTOR is generally stress tested by submitting a number of jobs onto the CE (or at least to the cluster) and have them probe the (test or preprod instance of) CASTOR. Since Luke was interested in saturating the network link [that's the spirit!], which is 10Gb at Bristol, the sensible thing to do would be to submit to GridPP sites in the UK and have them read from or write to Bristol as appropriate, and watch what happens. The RAL code is probably using RFIO or xroot, but writing a few lines of script which read a file from a known location is hardly rocket science. Since GridPP sites are sensible and support dteam, it should be possible to submit stuff to CEs around the UK and see what happens. Performance data is intended for a grant application, the user is storing hi-res scans of crystal surfaces. Brian reports that Matt had built a new WN on CVMFS with a modern version of GFAL. They will now test whether it resolves some of the issues reported earlier. There is a GDB today: http://indico.cern.ch/event/319749/ Ewan Mac Mahon: (08/07/2015 10:01:24) Hullo. Anything in the perfsonar tracerout logs? Can you see if it's going different ways at different times? If you can't see anything, of course, that doesn't mean nothing's happening. Brian Davies @RAL-LCG2: (10:13 AM) https://lcgfts3.gridpp.rl.ac.uk:8449/fts3/ftsmon/#/?vo=&source_se=srm:%2F%2Fsvr018.gla.scotgrid.ac.uk&dest_se=srm:%2F%2Fsrm-atlas.gridpp.rl.ac.uk&time_window=1 Lukasz Kreczko: (10:14 AM) Would DMLite + HDFS make an interesting blog post? Brian Davies @RAL-LCG2: (10:15 AM) @Lukakasz, yes it would Samuel Cadellin Skipsey: (10:15 AM) Yes, Lukasz - I did ask if you wanted to give a presentation to this meeting about it a while back, and anything presentation worthy also is blog post worthy. Lukasz Kreczko: (10:15 AM) sure, can do both Jens Jensen: (10:19 AM) THere'd be jargon if we explained the grid, too... Brian Davies @RAL-LCG2: (10:20 AM) Other interesting plot for FTS transfers for GLA-RAL is ( for last 48 hours): http://dashb-atlas-ddm.cern.ch/ddm2/#activity=%28Data+Brokering,Data+Consolidation,Data+Export+Test,Debug,Deletion,Express,Functional+Test,Group+Subscriptions,Production,Production+Input,Production+Output,Recovery,Staging,T0+Export,T0+Tape,User+Subscriptions,default,rucio-integration,test,test%3AT0_T1+export,test%3AT1_T2+export,testactivity10,testactivity20,testactivity70%29&date.interval=2880&dst.cloud=%28%22UK%22%29&dst.site=%28%22RAL-LCG2%22%29&grouping.dst=%28cloud,site,token%29&grouping.src=%28cloud,site%29&m.content=%28d_dof,d_dot,d_eff,d_faf,d_plf,s_eff,s_err,s_suc,s_thr,t_eff,t_err,t_suc,t_thr%29&p.bin.size=h&p.grouping=activity&src.site=%28GLA%29&tab=transfer_plots Gareth Douglas Roy: (10:23 AM) Thats actually what is slightly worrying over all the transfers seem reasonably okay but we stil have 37K queued at the site. Ewan Mac Mahon: (10:32 AM) We do also have a lot more data than humanities tend to, which does make a difference. Stashing VMs is certainly a common idea that comes up. Samuel Cadellin Skipsey: (10:32 AM) C++ iis both an ANSI and several ISO standards, IIRC Ewan Mac Mahon: (10:33 AM) Anyone want to bet that HEP C++ is standards compliant though. Plus, there's only so far a standard gets you when you're looking at a standard library call to something you don't have any more. This is where the 'stash a VM' idea comes from; if you make it self contained then all you need to do is be able to run x86_64 VMs. And there's likely to be a lot of people interested in running x86-64 VMs. Jens Jensen: (10:36 AM) There is Wolfram's CDF: computable document format Paige Winslowe Lacesso: (10:36 AM) Sorry, must - ! Ewan Mac Mahon: (10:38 AM) There was an interesting point in there about proprietary software too - does your licence even allow you to archive your stuff. David Crooks: (10:38 AM) Yep Ewan Mac Mahon: (10:39 AM) GOG started off entirely selling packages of dos games bundled with dosbox. Jens Jensen: (10:39 AM) Still do with some games. They run with 100% CPU. Only now it's a core :-) E.g. Zork Ewan Mac Mahon: (10:40 AM) Ah, but so long as someone builds something in the future that allows us to run circa 2015 linux userspace code, we can just run dosbox on that, and the game on that. Turtles /all/ the way down. I think one of the other points in favour of 'stash a vm' is that it's quite 'rich'; you don't try to predict which things future data archeologists will find useful, you just throw everything at them and let them pick through it. Gareth Douglas Roy: (10:42 AM) Along the turtles line... funny if you haven't seen it https://www.destroyallsoftware.com/talks/the-birth-and-death-of-javascript Ewan Mac Mahon: (10:45 AM) So far my timescale is 'in time to be useful', which does indeed need some firming up. I'm not sure that direct streaming sounds like the best possible approach. I think we usually suggest people stream to a local 'T0' right next to their kit and do batch transfers from there. The other thing adds a lot of potential failure points that can effectively take your experiment offline.