...
- Projects
- A project namespace (think of it as a cloud-based file system) where you and your team can securely store and share your data collection
- You can structure your project namespace however you wish
- You can associate discoverable meta-data with the structure and with the files that you upload
- Authentication
- For University of Melbourne staff and students, you can login directly with your institutional credential
- For other users, you can log in with a local account created for you
- Login via the Australian Access Federation is no longer supported since the AAF no longer supports the required enhanced SAML plugin.
- Authorisation
- Whatever account you login with, it must have roles granted to it (executed by the Mediaflux support team) to enable your account to be authorised to access resources
- Standard roles per project are created (admin, read/create/destroy, read/create, read) which can be assigned to project team members
- Access Protocols - Projects can be accessed via
- HTTPS (Browser-based access and various Java clients, e.g. Mediaflux Explorer, Mediaflux Data Mover, CLI clients, aterm) see the list of all access methods for more information
- SMB (i.e. a network file share)
- sFTP (e.g. FileZilla, CyberDuck, rclone)
- NFS (only with discussion with RCS Data Team)
- Data Movement
- See details on uploads here
- See details on downloads here
- See the Data Mover capability
- Data Sharing
- Data can be shared with external users (who don't have accounts) via shareable links
- Data can also be uploaded by external users (who don't have accounts) via shareable links
- Encryption (discuss with RCS Data Team)
- HTTPS protocol only: and sFTP protocols support encrypted transfers
- Files can be encrypted at the storage layer (protection against unauthorised access to system back end only). Other Currently only supported with the HTTPS protocol; other protocols will be available in the future.
- Selected meta-data can be encrypted (protection against unauthorised access to system back end only)
- Data Redundancy
- Mediaflux assets (the container of meta-data and data content (e.g. an uploaded file)) are versioned. Whenever your assets change (e.g. modify the meta-data or content) a new version is created. Old versions are retrievable.
- A second Mediaflux server runs at the Noble Park data centre. This is known as the Disaster Recovery (DR) server. Its only job is to receive copies of data
- The DR server is not accessible by normal users and is configured in a more restricted network environment.
- The DR server is not
- used as a fail-over - that is, if the primary system fails, we cannot switch over to operations using the DR server.
- Mediaflux assets (the container of meta-data and data content (e.g. an uploaded file)) are versioned. Whenever your assets change (e.g. modify the meta-data or content) a new version is created.
- The redundancy process copies all asset versions from the
- primary server to the DR server. When a new asset version is created, that new version is sent to the DR server and attached to the appropriate asset.
- Therefore, there are 2 copies of your data managed by Mediaflux (one on the primary system and one on the DR system).
- Data that have been destroyed before they are backed up to the DR server cannot be recovered.
- Data that have been destroyed on the primary server and that have been copied to the DR server are retrievable on request (an administration task).
- There is no user-controlled process that can delete data on the DR server.
- The process runs in quasi real time as assets are created/modified.
- High Availability
- The primary controller (the Mediaflux server that users log in to and interact with) is part of a High Availability pair. If one fails, the service can be moved to the other.
- Those DB exports are further replicated (copied) to a second Mediaflux server at Noble Park referred to as the DR (Disaster Recovery) server.
Other Relevant Operational Functionality
- Database Backups
- The data base database (the component that maintains all your meta-data and knowledge about assets (files)) on the primary controller server is exported and saved every three hours.
- Those DB exports are further replicated (copied) to a second Mediaflux server at Noble Park referred to as the DR (Disaster Recovery) server.
- These exports are retained for 2 weeks. This means that if the DB should become corrupted, the gap is 3 hours in which data may have arrived which is no longer known (it exists but the system would have no record of it)
- The DB backups are further synced to the Noble Park DR system (when they are removed from the primary after 2 weeks they are also removed from the DR server)
- Scalability
- The primary system consists of a controller node (handling data base database transactions) and 2 IO nodes. The IO nodes are used to actually move data to and from the storage. More IO nodes can be added as needed.
- the IO nodes are only utilised for the HTTPS protocol (SMB coming)
- The underlying storage is provided via a highly scalable CEPH cluster. More nodes can be added to the cluster as needed.
- The combination of the scalable Mediaflux cluster and scalable CEPH cluster provides a very extensible environment as our data movement needs grow
- The primary system consists of a controller node (handling data base database transactions) and 2 IO nodes. The IO nodes are used to actually move data to and from the storage. More IO nodes can be added as needed.