Docs
Docs
Docs
Contacts
software
bug_reports
links
faq
forum

Latest version: 2.1.13-6 - What's New - Evolution (png, ps)

 

Introduction

CASTOR, stands for the CERN Advanced STORage manager, is a hierarchical storage management (HSM) system developed at CERN used to store physics production files and user files. Files can be stored, listed, retrieved and accessed in CASTOR using command line tools or applications built on top of the different data transfer protocols like RFIO (Remote File IO), ROOT libraries, GridFTP and XROOTD. CASTOR manages disk cache(s) and the data on tertiary storage or tapes. Currently (2007) there are some 60 million files and about 7 petabyte of data in CASTOR.


CASTOR provides a UNIX like directory hierarchy of file names. The directories are always rooted /castor/cern.ch (the cern.ch will be different in other CASTOR sites). The CASTOR name space can viewed and manipulated only through CASTOR client commands and library calls. OS commands like ls or mkdir will not work on CASTOR files. The CASTOR name space holds permanent tape residence of the CASTOR files, while the more volatile disk residence is only known to the stager, which is the disk cache management component in CASTOR. When accessing or modifying a CASTOR file, one must therefore always use a stager.

 

Design and Architecture

CASTOR2 possesses a modular design with a central database for information handling. The main functionality can be grouped in 5 modules (architecture.png ), briefly introduced as it follows:

  • The Stager has the primary role of a disk pool manager whose functions are to allocate space on disk to store a file, to maintain a catalogue of all the files in its disk pools and to clear out old or least recently used files in these pools when more free space is required. A disk pool is simply a collection of file systems - from one to many.

  • The principal role of the Name Server is to implement a hierarchical view of the CASTOR name space so that it appears that the files are in directories. Names are made up of components (between //) and for each component, file access permissions, file size and access times are stored. The name server also remembers the file location on tertiary storage if the file has been migrated from the disk pool in order to make space for more current files. Files may be segmented or be made up of more than one contiguous chunk of tape media. This allows the use of the full capacity of tape volumes and permits file sizes to be bigger than the physical limit of a single tape volume. Additionally, it provides the ability to create directories and files, change ownership etc as specified in the posix standard.

  • Tertiary storage, Tapes

  • High performance cartridge tapes are used for tertiary storage in Castor. The type of tape units currently used are Sun 9940B (200GB), IBM 3592 E05 (700GB) and the Sun T10000 (500GB). Tapes are housed in Libraries or Robots and the latest models of library used by Castor are Sun SL8500 and the IBM 3584. Legacy STK Powderhorn Libraries will be phased out in 2007-8.

    The Castor Volume Manager database contains information about each tape's characteristics, capacity and status. The Name Server database mentioned above contains information about the files ( sometimes referred to as segments) on a tape, ownership and permission information and the file offset location on the tape. User commands are available to display information in both the Name Server and Volume Manager databases.

    The mounting of cartridges to and from tape drives is managed by the Volume Drive Queue Manager (VDQM) in conjunction with library control software specific to each model of tape library.

    The cost of tape storage per Gigabyte is still a lot less than hard disk, and it does have the advantage of being considered "permanent" storage. However, access times for data seen by users are of the order of 1-10 minutes rather than 1-10 seconds.
    Tape robot

  • The database plays a central role in the CASTOR2 design. Its database schema is very complex and contains about sixty tables and several procedures and triggers. The functionality of the database is to store the actual status information of ongoing processes to have stateless componest. Relation entity Diagram

  • The Client allows you to interact with the server in order to get the basic funcionaliti. You can get a file which is stored in the disk server or tape server, using RFIO, ROOT, GRIDFTP or XROOTD (coming soon). You also can check the status of a file or update it, as well as put a file. The client can be CASTOR client (command line mode or API) or SRM

    The Storage Resource Management SRM is a middleware component for managing shared storage resources on the GRID (enlace a grid). It provides dynamic space allocation on a file management and uniforms the access to heterogeneous storage elements, CASTOR is one of them.

For more information about the castor architecture, please check the links in the 'documentation' tab.

The Distributed Logging Facility (DLF) is a powerful logging facility which allows advanced message filtering. The propose of this tool is to streamline and centralise the logging and accounting in CASTOR. the facility consists of a DFL server with ORACLE or MySQL backend and a client API for writing logging records using ULM (standard Universal Logging Message).


This document was last modified on:
For comments and changes on this page send email to the Webmaster
Copyright CERN