Storage Minutes 13 April 2011 Brian - ATLAS PRODDISK has been filling up because T1 was missing a release. Brian is looking at formalising the ATLAS quota size. This is also a usecase for CVMFS (in that at least then all sites will have the same software). Hopefully, we'll see Sonar Test improvements, due to the increase of parallel streams in STAR-T2 channels (which are what Sonar uses for most of the T2s) Wahid raised the issue of FTS tuning for startup time and async latency. Brian - we've shown before that if an SE can be synchronous, that improves performance. SE's aren't, because they have to balance the tape latency in T1 use cases - but this isn't an issue with T2s. Even in asynchronous mode, there are potential gains to be made by decreasing the polling delay between requesting a transfer and checking if it has been successful. Potential issue for (say, QMUL) sites with only 1 gridftp server handling all their requests. QMUL have a problem with their new router - giving a gigabit simplex, not duplex. But also, it can take a considerable period for the checksum to be calculated at the end of the transfer. which can cause FTS to time out. Chris asked that verify-checksum mode was turned on for QMUL FTS ATLAS transfers - it seems to be off at the moment (probably because StoRM was a special case for some time). . Metrics discussion - (designed to test the effectiveness of the Stroage group, not individual sites) Some tweaks: Publication - extend to "publications of similar quality" Blogging - extend to other outreach activities. There was a long discussion / argument about what kind of metrics we want (and if we should overlap with the site performance metrics). Ewan (and Brian) wanted process-related metrics (i.e. reward working on the DPM toolkit, as we approve of that) Wahid and Chris wanted more outcome-related metrics (i.e. reward us for getting the UK sites to have good overall storage performance). some sample statements: Chris: I agree with Wahid. Step back - what's the point of the storage group? It's to make sure that we can deal with the demands on the storage! Tools, monitoring of the storage, consistency checks are things we should do more of, RAL tests were a good thing. How do you metric "solving problems when we discover them". Ewan: you don't want a metric "bugs fixed", as that assumes we'll have bugs to fix. The problem with the storage performance stuff is that it's already folded into the site performance metrics. The group metrics need to be more group based. Wahid: but if you consider storage availability across the UK, then that's a measure of our ability to get things to work. Brian: obviously, if there are metrics that have already been calculated, trying to ensure those metrics are being met can be a metric ("metric about getting sites to meet the site metrics") Ewan: there's a philosophical divide on what we should be metricising : group activities vs outcomes Brian: some of the metrics are imposed on us (they are interested in both aspects!), and the less we come up with ourselves, we may have them forced upon us. Eventual decision was made as far as: UK Storage Performance metrics (hard numbers) and Fluffier (things we think we want to be doing) - outcome is more judgement based (did this work? not "number of release" etc). And it was agreed that metrics should encourage support for small VOs. In general, we deferred the rest of the discussion 'til a meeting with a Jens present. Sam