Attending: Elena, Ewan, Gareth, Jens, JohnB, JohnH, Matt, Robert, Wahid, David, Sam, Steve, Govind, ChrisW, Raja. 0. Operational blog posts. Only four posts this quarter! And two weeks to go: we need a few more (ideally twice as many!) Revisiting Liverpool's thrashing problem, discussed last week. Files are too big... it is xrootd traffic, although there is some background GridFTP. Users are using direct IO (aka file access). There is a patch for xroot but it has not made it into even Xroot4. Also, Sam's Secret Recipe make it possible to limit threads; it is said to be safe but needs testing - could do at-risk. Also there is a cgroup for BlockIO which might be another useful approach. Discussion about the size of drives in a RAID set. Lustre recommends 16+2. Chris has published details at HEPiX. Sam suggested smearing out the load which will also help. Chris has set the read-ahead buffer higher than the default; this may also help. Liverpool already tried that, but it may be useful to know about these tricks - some should be documented in the wiki, and if they aren't there, they should be added. 1. Last week's GDB on data management? http://indico.cern.ch/event/272777/ xroot4. Needs updated client and server to take advantage of new features (see link for list of features). Packaging issues on SL6. There will be a pre-GDB in December on the "protocol zoo", asking in particular whether we could use more standard protocols (again). 2. Wiki... still no storage updates... or information for "small" VOs. 3. Pondering the role of "other" storage systems in future infrastructure. And interoperation with other storage gri^W clo^W infrastructure. CEPH is being seen (and soon used) as D1T0 storage at RAL, not just for T1 but also for STFC's other facilities (synchrotrons and stuff). Sam's opinion is that most of the current grid storage is stuck with 1990's technology and at least small VOs could prefer S3 [even if S3 is not a standard] - also HTTP may be an option - small VOs don't always need the performance the large ones need, either, so could settle for easier interfaces. 4. Volunteers to talk about technical topics? Requested topics: - iRODS - MEW will have a section on Lustre. MEW is in Coventry (or just outside Coventry) early December; if anyone is going could they let Chris know? 5. AOB There is an ATLAS software week: https://indico.cern.ch/event/276501/ Although it may require the secret ATLAS handshake to join. Wahid suggested the following topic for next week: https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/GridFTP Ewan Mac Mahon: (17/09/2014 10:02:31) I'm just going to leave this here again, for anyone that's not already done the DPM / Argus thing: http://gridpp-storage.blogspot.co.uk/2014/08/argus-user-suspension-with-dpm.html Though actually, I think most people here already have. Samuel Cadellin Skipsey: (10:03 AM) I endorse Ewan's endorsement. Gareth Douglas Roy: (10:06 AM) Has anyone tried running xroot in a cgroup and throttle blkio? Something like: mount { blkio = /cgroup/iolimit; } group iocap1 { blkio { # Limit reads from /dev/sda1 to 50 MB/s blkio.throttle.read_bps_device="8:1 52428800"; } } Matt Doidge: (10:08 AM) No -but this is relevent to our interests. Thanks! Gareth Douglas Roy: (10:08 AM) http://docs.oracle.com/cd/E37670_01/E37355/html/ol_use_cases_cgroups.html Ewan Mac Mahon: (10:09 AM) Yes. Where 'occasionally' is more like 'every other sylable' wahid: (10:09 AM) how many iops ah right Ewan Mac Mahon: (10:10 AM) It's actually remarkably intelligable. Gareth Douglas Roy: (10:10 AM) blkio.throttle.write_iops_device Ewan Mac Mahon: (10:10 AM) It's not turned its write cache off or something catastrophic? wahid: (10:10 AM) its reading - so is that catastrophic (I ask as I have some disk servers where write cache gets turned off due to (fake) battery failure and it didn't seem to harm it) Ewan Mac Mahon: (10:11 AM) Depends on a card I think; some of them can throw a bit of a wobbly when they're in an 'odd' state. And we've seen things in the past where one direction would starve the other, so either slow writes would block reads, or vice-versa. I just tend to agree with Sam that they seem a bit slow and there may be an underlying problem to solve. Jens: (10:16 AM) Also if you throttle access, things might time out. Timeout => failed transfer. Ewan Mac Mahon: (10:16 AM) Do we know how to throttle webdav? @Jens - that's certainly true if there are actualy more requests than the system can handle, but there does seem to be a middle ground in which the load is enough to make the system bog down and run extra slowly, at which point the problem just snowballs. Samuel Cadellin Skipsey: (10:18 AM) Also, re: blkio cgroups, you can also provide iops limits. See: https://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt Gareth Douglas Roy: (10:18 AM) and redhats docs https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch-Subsystems_and_Tunable_Parameters.html John Bland: (10:19 AM) that could be useful, but not just for me, we're all susceptible to this problem Samuel Cadellin Skipsey: (10:19 AM) Indeed, we're looking at cgroups and namespaces for a lot of things here at the moment. They're really very useful. wahid: (10:20 AM) indeed. Not everyone is suffering so it could also be liv queue or something on the atlas agis side (though I looked and couldn't see anything) This is the wiki page for tuning https://www.gridpp.ac.uk/wiki/Performance_and_Tuning people are welcome to update of course ! Ewan Mac Mahon: (10:21 AM) I keep meaning to look into them more, but I'm slightly wary given the apparent move to a whole new API in recent/future things. Samuel Cadellin Skipsey: (10:21 AM) I might add some text on cgroups as a "think to try" to the wiki. Jens: (10:21 AM) http://indico.cern.ch/event/272777/ Samuel Cadellin Skipsey: (10:22 AM) Ewan: sure, there's now the "unified controller" approach for cgroups. But that's in 3.14 onwards, IIRC (and in any case, it just changes *where* you apply the hierarchical settings, for the most part). It also doesn't, for example, change the kernel namespaces implementation, which is orthogonal to cgroups and just as useful. Ewan Mac Mahon: (10:23 AM) Indeed. The 'single controller' thing is in RHEL7 though, I think? wahid: (10:23 AM) https://indico.cern.ch/event/276501/ Networking: overview of T2D, Samuel Cadellin Skipsey: (10:24 AM) Ewan, I think they backported it, yes. RedHat is very pro-Cgroups. (Which is another reason we should actually use them for things.) John Hill: (10:26 AM) It let me see it only when I logged in Jens: (10:26 AM) https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/GridFTP Ewan Mac Mahon: (10:26 AM) Indeed. And I should add that my wariness is around making anything that's too tightly bound to the current implementation; I don't want to avoid them entirely. LWN also have a series on cgroups at: http://lwn.net/Articles/604609/ wahid: (10:27 AM) sorry ! Ewan Mac Mahon: (10:27 AM) Which is probably good, but I've not had the time to go through it yet. Samuel Cadellin Skipsey: (10:27 AM) Oh, indeed, I understand completely. (LWN also have a series on namespaces here: http://lwn.net/Articles/531114/ ) wahid: (10:27 AM) yes - now one can just use the headnode sam can explain ! Now I want to test this week at Edinburgh maybe we can talk about next week ? shaun could come Jens: (10:28 AM) Yes, let's wahid: (10:28 AM) thanks Ewan Mac Mahon: (10:29 AM) So, with this gridftp redirect thing, should we be putting the headnodes back in the gocdb as classic CEs? wahid: (10:29 AM) haha well they can still use SRM or just gridftp but thats always been possible Samuel Cadellin Skipsey: (10:29 AM) We jest, but actually I think they might work as pure gridftp classic SEs, yes. wahid: (10:29 AM) this is just more performance performant Anyway it works - but you need all 1.8.8 - and I haven't checked it really works (in production with atlas workloads etc.) so that is what I will do Ewan Mac Mahon: (10:31 AM) I'm only semi-jesting too; if we can make classic SE a respectable interface again, we can potentiall have people run actual classic SEs, which are a decent proposition these days if you run them over cluster filesystems. Samuel Cadellin Skipsey: (10:31 AM) I think we can allow Ian the small lapse in judgement. wahid: (10:32 AM) 'Will be' ? or 'might be' lets not get so excited - just a year ago we were saying it wasn't ready Ewan Mac Mahon: (10:32 AM) And we do list things in the gocdb mutliple times if they have multiple interfaces; our SE is already in there as an SRM and as an xrootd, for example. wahid: (10:33 AM) but dpm works - and we get some advantages from that (not that lustre doesn't work) Ewan Mac Mahon: (10:34 AM) We do, but mostly in terms of interoperability with things that expect weirdy grid interfaces. Christopher John Walker: (10:34 AM) Christopher John Walker: (10:34 AM) I keep trying to get the DPM guys to do something that would work on top of Lustre - or another distributed filesystem wahid: (10:34 AM) it does work well if you just mount it on a node exactly thats what I am doing with the GPFS RDF we are coming up with a solution from reading being directly over GPFS so that part won't be throttled but the fts wil have a certian number of gateways but thats fine I think (and the same as storm + lustre) Christopher John Walker: (10:36 AM) It provides an alternative to StoRM. wahid: (10:37 AM) or swear at it ! the other topic obviously is ceph again - if we are serious or even lustre .. Ewan Mac Mahon: (10:38 AM) Maybe the iRods talk could also be a guest blog post....