Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Projects
    • A project namespace (think of it as a cloud-based file system) where you and your team can securely store and share your data collection
    • You can structure your project namespace however you wish
    • You can associate discoverable meta-data with the structure and with the files that you upload
  • Authentication
    • For University of Melbourne staff and students, you can login directly with your institutional credential
    • For other users, you can log in with a local account created for you
    • Login via the  Australian Access Federation is no longer supported since the AAF no longer supports the required enhanced SAML plugin.
  • Authorisation
    • Whatever account you login with, it must have roles granted to it (executed by the Mediaflux support team) to enable your account to be authorised to access resources
    • Standard roles per project are created (admin, read/create/destroy, read/create, read) which can be assigned to project team members
  • Access Protocols - Projects can be accessed via
  • Data Movement
  • Data Sharing
    • Data can be shared with external users (who don't have accounts) via shareable links
    • Data can also be uploaded by external users (who don't have accounts) via shareable links
  • Encryption (discuss with RCS Data Team)
    • HTTPS protocol only: and sFTP protocols support encrypted transfers
    • Files can be encrypted at the storage layer (protection against unauthorised access to system back end only). Other Currently only supported with the HTTPS protocol; other protocols will be available in the future.
    • Selected meta-data can be encrypted (protection against unauthorised access to system back end only)
  • Data Redundancy
    • Mediaflux assets (the container of meta-data and data content (e.g. an uploaded file)) are versioned. Whenever your assets change (e.g. modify the meta-data or content) a new version is created.  Old versions are retrievable.
    • A second Mediaflux server runs at the Noble Park data centre. This is known as the Disaster Recovery (DR) server. Its only job is to receive copies of data
      • The DR server is not accessible by normal users and is configured in a more restricted network environment.
      • The DR server is not
      currently
      • used as a fail-over - that is, if the primary system fails, we cannot switch over to operations using the DR server.
    • Mediaflux assets (the container of meta-data and data content (e.g. an uploaded file)) are versioned. Whenever your assets change (e.g. modify the meta-data or content) a new version is created.
      • The redundancy process copies all asset versions from the
      main
      • primary server to the DR server. When a new asset version is created, that new version is sent to the DR server and attached to the appropriate asset.
      • Therefore, there are 2 copies of your data managed by Mediaflux (one on the primary system and one on the DR system).
      • Data that have been destroyed before they are backed up to the DR server cannot be recovered.
      • Data that have been destroyed on the primary server and that have been copied to the DR server are retrievable on request (an administration task).
      • There is no user-controlled process that can delete data on the DR server.
    • The process runs in quasi real time as assets are created/modified.
  • High Availability
    • The primary controller (the Mediaflux server that users log in to and interact with) is part of a High Availability pair. If one fails, the service can be moved to the other.
    • Those DB exports are further replicated (copied) to a second Mediaflux server at Noble Park referred to as the DR (Disaster Recovery) server.

Other Relevant Operational Functionality

  • Database Backups
    • The data base database (the component that maintains all your meta-data and knowledge about assets (files)) on the primary controller server is exported and saved every three hours.
    • Those DB exports are further replicated (copied) to a second Mediaflux server at Noble Park referred to as the DR (Disaster Recovery) server.
    • These exports are retained for 2 weeks. This means that if the DB should become corrupted, the gap is 3 hours in which data may have arrived which is no longer known (it exists but the system would have no record of it)
    • The DB backups are further synced to the Noble Park DR system (when they are removed from the primary after 2 weeks they are also removed from the DR server)
  • Scalability
    • The primary system consists of a controller node (handling data base database transactions) and 2 IO nodes. The IO nodes are used to actually move data to and from the storage. More IO nodes can be added as needed.
      • the IO nodes are only utilised for the HTTPS protocol (SMB coming)
    • The underlying storage is provided via a highly scalable CEPH cluster. More nodes can be added to the cluster as needed.
    • The combination of the scalable Mediaflux cluster and scalable CEPH cluster provides a very extensible environment as our data movement needs grow