Last reviewed and polished 24 Oct 2006. Overview of T2 storage * Disk only required at Tier 2s * Quality of disk is classified (in new SRM terminology) as - "Replica" (user is expected to easily be able to recover the data from another site), - "Output" (data that will be expensive to recreate, e.g. job output data), - "Custodial" (cannot be recreated and will not be lost unless a meteor hits the building). * Different VOs require different CPU/storage ratios. * Different sites support different VOs. Most sites support most or all GridPP VOs with one or more smaller allocations, and a main VO with most of their resources. * All sites are expected to provide CPU and storage in a specific ratio, where the ratio depends on the VO (and possibly time). * As of Q2 2006, we understood these ratios were required: - Atlas 2:1 - CMS 3:1 - LHCb 333:1 Here, the CPU is measured in KSi2K (kilo specint), and storage in Terabytes (10^12 bytes we assume). Thus a site with a 600 KSi2K cluster dedicated wholly to CMS must provide 200 TB storage. Q: Are these ratios still valid? Q: Will these ratios change? (we expect LHCb will change at some point) Q: What quality of service do experiments expect: - Uptime we know ("95%") - Guaranteed allocations we know ("yes for the main VOs") - One or more of custodial, output, replica? - Other QoS factors? - Performance targets. Currently the only way to guarantee that space is available to a VO is by allocating physical storage resources (e.g. a disk partition) to the VO and not share it with other VOs. Storage middleware does not support quotas. Site recommendations: * Sites should allocate the "obvious" (i.e. multiply the VO's requirements with the ratio of resources allocated to it) ratio of dedicated storage for the main VO (or main VOs) in which they are active and allocate extra shared storage for the other GridPP VOs. Example of what "obvious" means, just in case it isn't obvious: suppose a site is 20% Atlas and 80% CMS and the ratio is 2:1 for Atlas and 3:1 for CMS. Then if the farm has 600 ksi2k, you need to provide 600 * 0.5 * 0.2 + 600 * 1/3 + 0.8 = 240 TB storage. This raises the question of how to provide storage for the remaining VOs that all sites support. In the absence of requirements to say otherwise, it seems sensible to provide a certain percentage to the rest - 2%, 5%. Example: Lancaster has 80% allocated to Atlas and 20% to everybody else (which includes Atlas). Example: Bristol has bought 1:1 CPU/storage hardware by cost, but high end storage (GPFS too). They get 600 KSi2K and 150 TB, so 4:1. Not so concerned about absolute numbers. Bristol is a CMS site so should really provide 3:1 (200 TB) (assuming 100% CMS). * When buying hardware, sites should aim to meet the CPU/storage ratio, rather than absolute numbers from planning documents and MoUs. Sites should share their purchasing and operational experiences with the rest of the storage group (but bear in mind that the storage mailing list is world readable). Also performance and optimisation hints. * Performance targets - need to optimise SEs and hardware. * Keep the timescale in mind. For example, if LHCb require almost nothing today, but a lot next year, then that should be taken into account (for sites supporting LHCb). * Sites should try to guarantee allocations for their major VO(s). Ideally by physically dedicating partitions to them, but in general allocating pools or pool groups (dCache) to them, not shared with other VOs. * What is the best way to guarantee (physical) storage allocations? Probably the best approach is to allocate large physical storage units for the main VO(s), but it might be useful to have smaller physical units which can be reallocated, e.g. to be able to support new VOs, or to give VOs more guaranteed space temporarily. This would require that middleware can drain pools or transparently reallocate the storage, but even then it is a rather inefficient use of storage. The FNAL production dCache operates with ~300 pools and hundreds of TBs of disk. Quotas are not likely to be implemented on a realistic timescale, and SRM spaces will not be supported by all SRMs - but high level static (set up by sysadmin) SRM spaces may be a reasonable way to provide allocations for VOs. Big (mostly open) questions: How much are sites short of delivering the expected storage capacity for their main VO(s)? What quality of disk do they need to provide? Is it as simple as custodial/output/replica (mapped to hardware in some suitable way)? How can sites best enable storage on WNs? The storage group does not currently support distributed filesystems, but sites have been talking about lustre, gpfs,... What is the largest capacity that DPM can realistically manage? More things to ponder: * The NGS has a different model. Two core sites provide extra storage, and the other two provide extra CPU. GridPP doesn't use that model: all sites must provide both storage and CPU. * Pooling storage between Tier 2s is probably not a good idea (pool communication is usually insecure). * There may be a risk that a site, aiming to meet a N TB target, will purchase cheap disk - some VOs will need higher quality storage also at Tier 2s (eg for databases)? * Many sites are finding disk underused? * Is it a problem that we are not meeting the absolute numbers (which are quite ambitious)? * Cost of storage may include staff, space, power, aircon. Creating multiple copies of files, for resiliency or parallel reading streams, means physical capacity has to be N times larger. Other comments: * Sites should publish the quality of service according to best practice (GLUE schema). * SE implementations should publish available and used space per VO. For VOs sharing space, that space is potentially published twice (unless one cheats and publishes a fraction). Version 1.3 of the GLUE schema was meant to remedy this, but the subject of available and used is still a hot potato. Both the definition and the usefulness of "available" and "used" are a subject of heated debate in the SRM protocol group. * Sites should optimise their installations according to the recommendations from the storage group. * HEPiX storage task force investigating prices and technologies. * Middleware doesn't support quotas. Need to investigate middleware support for guaranteed reservations (spaces in SRM 2.2). A test case was proposed for S2 (SRM 2.2 test client) but we don't know whether it was built. There are no guaranteed reservations in SRM 1.X.