Cloud Storage

Google Drive

Python

msaFilesystem: Practical Way of Life System Management

10 min read

Managing different file systems in modern applications can be messy — but it doesn’t have to be. In this article, written by our developer Eduard, you’ll discover msaFilesystem — an agnostic, abstract filesystem API that simplifies working with S3, Azure Datalake, local storage, YouTube, and more. Eduard explains the challenges, presents the solution, and shares real-world examples using the powerful FileWorker class.

Let’s start with a definition. msaFilesystem stands for Agnostic Abstract Filesystem API which allows using S3, GCS, Azure Datalake, your local FS, Youtube etc.

In this article I will present to you the overall description of the existing problem, the solution to it that I recommend to use, and will share some examples of practical utilization of msaFilesystem for working with different file systems.

Problem description

  • In modern applications, there is often a need to work with different types of file systems. Depending on the project’s requirements, this could include local file systems, cloud storage (such as S3, Google Cloud Storage or Azure Datalake), network file systems (such as SMB or WebDAV), and specialized data sources (such as YouTube or Dropbox). In such conditions, developers have to use different libraries and interfaces to work with each of these systems, which leads to more complex code and increased development time.

The reason for choosing the suggested approach

We decided to use msaFilesystem to create a single abstract API for working with different file systems. This approach allows you to use a single library to access all the necessary file systems, which greatly simplifies code development and maintenance. We chose to optimize for FastAPI and Pydantic, as these are modern and popular tools for developing web applications in Python.

The key tasks the following approach solves

Using msaFilesystem solves several key problems:

  • Unification of interfaces
    Provides a single interface for working with different types of file systems.
  • Integration Simplification
    Reduces the amount of code required to integrate different file systems within an application.
  • Scalability
    Allows you to easily switch between local and cloud storage without changing code.
  • Development optimization
    Integration with FastAPI and Pydantic speeds up the development process and improves code readability.

Examples of msaFilesystem usage

Below you can familiarize yourself with the examples of using msaFilesystem when working with various file systems.

The local file system

from fs import open_fs
local_fs = open_fs('osfs://.')
with local_fs.open('example.txt', 'w') as file:
    file.write('Hello, World!')

The file system Amazon S3

s3_fs = open_fs('s3://mybucket')
with s3_fs.open('example.txt', 'w') as file:
    file.write('Hello, S3!')

The temporary file system

temp_fs = open_fs('temp://')
with temp_fs.open('example.txt', 'w') as file:
    file.write('Temporary data')

The core capabilities

Among the existing capabilities of file system usage I can name the following ones:

  1. Abstract file system
    A single interface for working with various file systems such as S3, GCS, Azure Datalake and local file system.
  2. Support of various protocols and storages
    FTP, memory, ZIP, SMB, WebDAV, Azure Datalake, S3, Google Cloud Storage, Google Drive, Dropbox, OneDrive and YouTube.
  3. Optimization for FastAPI/Pydantic
    Simplified integration with modern web frameworks.

The additional capabilities

Apart from having the most well known capabilities, the system has some additional number of features.

  • File system management in the memory
    Temporary and cacheable file systems for temporary data storage and testing.
  • Archive support
    Archives’ read and write functions in such formats ZIP and TAR.
  • Installation of virtual file systems
    The installation possibility of sub catalogs in other file systems.
  • Network file system support
    SMB, WebDAV and other protocols for secure access to files on the network.

The FileWorker class was created based on msaFilesystem.

FileWorker is a class that provides an interface for working with various files and file systems using msaFilesystem. Below are examples of how to use FileWorker to perform various tasks.

Since it was used by our platform for processing documents, it was important for us to separate documents by subdomain (organization). For this purpose we used folders.

The initialization of FileWorker

To start working with FileWorker, you need to create an instance of it, specifying the subdomain, document and client identifier, along with a path to the file system.

from msaFileWorker.file_worker import FileWorker

file_worker = FileWorker(subdomain="example_subdomain", document_id="example_document_id")

The developed methods

Below I would like to present you a piece of code that will familiarize you with the developed methods:

save_bytes_file
save_many_bytes_as_files
save_text_as_file
save_many_texts_as_files
get_file_as_zip
get_folder_as_zip
delete_file
delete_folder
get_file_as_bytes
get_file_content

The auxiliary methods

The methods listed below represent the auxiliary ones:

_get_zip_one_file
_create_folders_if_not_exist
_zip_with_subfolders
_split_path
All methods are quite simple and comprehensible. It is possible to understand what they do from their names.

_zip_with_subfolders method can be pretty interesting to explore. Its implementation uses recursion to collect all subfolders.

async def _zip_with_subfolders(self, zip_buffer: BytesIO, folder: str, path: str) -> BytesIO:
    """Create zip with all files(include all folders)
    Parameters:
        zip_buffer: BytesIO object
        folder: name of folder
        path: path to folder
    Returns:
        BytesIO
    """

Disclaimer

msaFilesystem was created exclusively for the internal needs of our company and is intended to simplify the work with file systems in various environments. The main goal of developing this package was to create a convenient and effective tool for integrating various types of file systems, allowing developers to focus on solving business problems, rather than on the routine work on service calls and setup process.