Attending: Matt, Jens (chair+mins), Winnie, Alastair, Vip, Teng, Wenlong, Sam, Marcello, Brian, RobA, Duncan, RobC, Dan Operationl issues - ATLAS storage at Glasgow, temporarily lost disk server, sad RAID controller. Now we need extra time for physical interventions due to getting to the data centre and the extra precautions taken when working alone. For example a 4U disk server weighs 75kg or thereabouts. So this discussion is more like a roundtabley view of the future of storage, as asked by Alastair, and T2's plans, if known. Related to the lost files at Glasgow, one question is whether the SLA is against a site or against a T2 (being ScotGrid in this case) ************************************************************************ The list of questions are: 1. Is storage shared with other users? (POSIX helps as many local users require POSIX) 2. What are the future directions of the site - distributed fileystem - Protocols - cf HTTP vs xroot discussion - sensitive topic! - software, hardware, future technologies? - role of site in provisioning - cache only, full storage, CPU only, etc. - exit strategy - how to migrate off existing solution - copy stuff out and in again - drop everything and let experiment copy stuff back in - something in between, recognising that some data is more precious than other 3. Who are your users - expecting LHC (suitable subset of), plus IRIS (let us know if not) - anyone else? (TWG discussed supporting COVID-19 research yesterday) 4. Software defined infrastructure - how does that help/hinder? 5. How to support site storage? (cf previous discussions on effort required to run DPM, EOS, etc.) 6. How (whether) to engage with the wider storage community - industry, international, CHEPIX. ************************************************************************ Going round the T2s, starting with ScotGrid - - Glasgow, filesystems are not shared but would like to, and CEPH would be an opportunity. - Edinburgh, also wishes to share storage. Not RDF though; local users need another PB. A bit of a lesson learned on RDF. Local users want POSIX, maybe isolate from Grid excesses (meaning transfers on/off site), etc. - Durham (not present) DPM, gluster - probably go cache for WLCG/grid. Northgrid: - Lancaster IRIS storage, with options to move to other solution, depending. Group of xroot users, others DPM. - Liverpool - (not present) DPM 4ever - Manchester (not present) obvs large site, DPM Southgrid: - Bristol - known situation; similar to US sites in HDFS use which is not the best situation to be in (US practices sometimes differ) and not the worst (others do approx same thing). Clearly shared; problems being resolved, need to move off DPM due to DOME unsupported for HDFS and do xroot with help from Sam. - RALPP (not present) dCache and xcache; need to check at earliest op. Related to this, as we have discussed in recent weeks, dCache has long standing support for pNFS and would therefore be a good option for T2 supporting local use. Need to re-evaluate (again). Only two dCache sites in UK, but sites generally happy. - Oxford - DPM and cache understood; gluster for local users. - Oxford. - Sussex (not present) London - IC also dCache, suggesting opportunity for further dCache use in UK (see RALPP entry) - QMUL - Lustre robust, StoRM working well and switching to nginx impl. Saw perf issues on ZFS though, but this could be down to the need for tuning - about 1/2 the expected performance. Also expecting larger disk drives to matter (denser storage nodes) - RHUL (not present) similar to Oxford. - Brunel (not present) - Raul known to have plans... Chatlog (maybe that should be `chat' log (think french): From Dewhurst, Alastair (STFC,RAL,SC) to Everyone: 10:03 AM can I just check that this is the storage meeting? I fear I have logged into a different meeting From rob-currie to Everyone: 10:03 AM Well one of our disk servers has 9 llives :P