Attending: Gareth, John H, Marcus, Jens (chair+mins), Winnie, Robert, Tom Byrne, Tom W, Sam, Duncan, Daniel, Brian, Raul, Matt D, Pete, Steve 0. Only three blog posts this quarter No operational issues to report 1. Round Table Gareth: not much to report, storage wise. Duncan: data transfers for DIRAC. Can submit to FTS from DIRAC. Unfortunately the system was designed to use a "shifter's proxy" as opposed to the user's proxy, so needs a rethink. Also interested in T2Cs - while CPU efficiencies are normally the purview of the dteam meeting (aka TB-SUPPORT), T2Cs could put additional load on other sites - which is what we are aiming to test with the Oxford Experiment. Duncan noted that QMUL had been copying files over from RHUL - a large number of files per job - where remote IO might be the better option and thinks the overall CPU efficiencies will be a measure also of the data transfer/access efficiencies. Brian points out with remote access latency becomes a factor. And so does the increased load on the T2Ds. Hence the need to experiment: which will hopefully take into account the file sizes, ability or lack of same to share files between jobs, etc. US sites "looking for EU involvement" Action Sam to prod Ilya about the above. John: checksum stuff seems to be sorted. Aiming to sort out puppet config issues but maybe not before Christmas as other things will have higher priority. Currently just the single pool on 1.8.10 using SL5. Marcus: not much operationally; some hw old and may get round to procurement. Test DPM available; also interested in distributed storage in general. Winnie: asked whether other sites were running dmlite with HDFS. In general it should be possible (see chat) as long as a plugin is available, or one could write a plugin of course - also general interest in one for CEPH. Should be possible even to have different pools on different file systems. Robert: two pool nodes down. Trying puppet. DPM 1.8.9. Sam: looking at the ATLAS namespace dump problem. Apparently the script trawls the entire database every time a dump is created for an endpoint, and it needs to create 19. It takes an hour each time, so 19 hours for a trawl that should only be done once. Does not seem particularly efficient. Also DPM upgrade testing and testing the redirectors. Matt D: pool nodes - installing by hand, will move to Ansible. Going to 1.8.10. Raul: no other news other than reported last week. Can help with Ansible. Steve: old hardware. Should balance between CPU and disk be revisited? Duncan mentioned the commitment is by T2 (not the site, so North, South, London, Scot). There is a spreadsheet; see Pete's links in chat. Pete: old disk servers decommissioned, 80TB. Daniel: planned swap over to Lustre, almost finished and scheduled with ATLAS for next week. Also discussing T2D with LHCb. However, with SRM LHCb could determine whether access was local but apparently not in SRM-less - would be worth clarifying with LHCb (ie Raja) at the next opportunity. Brian - working on the CASTOR namespace dumps [there are two flavours, filenames only and also metadata such as length, timestamp, checksums] - empty directories slows down the generation of dump. Can take 3-4 hours for something like ATLASDATADISK. Also tracking PRODDISK removal at T2s, plus the WAN tuning and DiRAC. And dteam on CASTOR [have changed service class] and test S3 echo endpoint - which would be an interesting topic for next Wednesday? Jens: DiRAC technical discussions with Sam, Brian, Lydia; Leicester about to start, too, so need updated instructions. Probably need to start over with Durham as we discussed last week. Can a file split across tarball segments be recovered? Jens' tests (on DIRAC-USERS) said no; Sam will try to replicate. A file wholly in a single segment can be extracted even if tar reports an error. Also liaising with Liverpool re Alloy 2 (which is now called Indigo (no relation to the DataCloud); Alloy 1 was based on iRODS and Alloy 2 - Indigo - on Cassandra) Gareth Douglas Roy: (02/12/2015 10:10:14) And Janet's problems yesterday Brian Davies @ RAL: (10:11 AM) so there was a report that direct io rates were inversely proportional to th e rtt of th edisytance the rtt of the distance Samuel Cadellin Skipsey: (10:11 AM) because it's seeky, and therefore latency limited, yeah Paige Winslowe Lacesso: (10:21 AM) Microphone no work :( Duncan Rand: (10:22 AM) ATLAS FAX CHEP: https://indico.cern.ch/event/214784/session/6/contribution/265/attachments/341115/476003/atlas-fax-chep-2013-v3.pdf Paige Winslowe Lacesso: (10:24 AM) One thing want to know; Bristol is AFAIK the only UK site running dmlite with hdfs backend; impression is other sites are interested in this but want to see "how it goes"; given that dmlite+hdfs is going pretty well (I think) at Bristol, are any other UK sites thinking to do same?? Jens Jensen: (10:24 AM) Thanks, Duncan. Thanks, Winnie. Don't think anyone else is doing hdfs like that? Paige Winslowe Lacesso: (10:28 AM) Blow me down. Somehow my imression is that other UK sites were looking with interest at dmlite+hdfs. So much for impressions.... Daniel Peter Traynor: (10:28 AM) everything is interesting Samuel Cadellin Skipsey: (10:29 AM) Winnie: the issue is making the change to an entirely different storage infrastructure layer. Duncan Rand: (10:29 AM) raul you You sound like you are in the Brazilian jungle Samuel Cadellin Skipsey: (10:29 AM) HDFS is certainly a better storage solution than the "DPM filesystem" approach, but the cost of moving is large. raul: (10:30 AM) Kind of. I was in the jungle last week. Real Brazilian jungle Samuel Cadellin Skipsey: (10:30 AM) But we're definitely interested in how well dmlite + (proper distributed parallel storage solution) works. Duncan Rand: (10:31 AM) ceph Samuel Cadellin Skipsey: (10:31 AM) Or, indeed, ceph, but there's no official effort in DPM Core for supporting a ceph backend. (which pains me, too) Matt Doidge: (10:31 AM) I thought dmlite was filesystem agnostic? Samuel Cadellin Skipsey: (10:32 AM) Well, it can support any filesystem/storage system, with an appropriate plugin. there's a "generic VFS" plugin, but I have no idea how well it would work against ceph Daniel Peter Traynor: (10:32 AM) lustre?, could add a test dmlite infront of our lustre if that would be of interest Peter Gronbech: (10:32 AM) The Mou from August this year which is near enough is here:https://archive.gridpp.ac.uk/deployment/status/reports/reports.html https://archive.gridpp.ac.uk/deployment/status/reports/reports.html Samuel Cadellin Skipsey: (10:33 AM) Dan: sure, lustre should work. It's quite posixy, so I guess the standard plugins would work with it. (The hdfs plugin understands hdfs, on the level of "bits of this file are on nodes X,Y,Z, so I can transfer the file optimally from services running on X,Y or Z.") Daniel Peter Traynor: (10:34 AM) would need a non puppet config method is there some docs/web/wiki on dmlite Samuel Cadellin Skipsey: (10:36 AM) Yeah, one mo. https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Dmlite is the root of the docs