How does Magisk work?


Question

Magisk is known as a “systemless” root method. It’s essentially a way to modify the system without actually modifying it. Modifications are stored safely in the boot partition instead of modifying the real system files.


I have looked around but did not find a sufficient explanation as to how it actually works. How is the root access gained and maintained? What exactly is the role of the boot partition and if it integrates with the system partition how it does it?


A really detailed description of how it works lacks from everywhere I searched so it would be really highly appreciated.


Answer

Most part of your question is covered in Magisk Documentation. I will quote one of my previous answers to a different question, with some unnecessary details :)


PREREQUISITES:


To have a comprehensive understanding of how Magisk works, one must have basic understanding of:



  • Discretionary Access Control (DAC)

  • User identifiers ([ESR]UID), set-user-ID

  • Linux Capabilities (process and file) which provide a fine-grained control over superuser permissions

  • Mandatory Access Control (MAC)

  • SELinux on Android

  • Mount namespaces, Android's usage of namespaces for Storage Permissions

  • Bind mount

  • Android boot process, partitions and filesystems

  • Android init services (the very first process started by kernel)

  • *.rc files

  • Structure of boot partition (kernel + DTB + ramdisk), Device Tree Blobs, DM-Verity (Android Verified Boot), Full Disk Encryption / File Based Encryption (FDE/FBE) etc.


WHAT IS ROOT?


Gaining root privileges means to run a process (usually shell) with UID zero (0) and all of the Linux capabilities so that the privileged process can bypass all kernel permission checks.

Superuser privileges are gained usually by executing a binary which has either:



This is how su and sudo work on Linux in traditional UNIX DAC. Non-privileged users execute these binaries to get root rights.



This is the less common method used.


In both cases the calling process must have all capabilities in its Bounding Set (one of the 5 capabilities categories a process can have) to have real root privileges.


HOW ANDROID RESTRICTS ROOT ACCESS?


Up to Android 4.3, one could simply execute a set-user-ID-root su binary to elevate its permissions to root user. However there were a number of Security Enhancements in Android 4.3 which broke this behavior:



  • Android switched to file capabilities instead of relying on set-user-ID type of security vulnerabilities. A more secure mechanism: Ambient capabilities has also been introduced in Android Oreo.

  • System daemons and services can make use of file capabilities to gain process capabilities (see under Transformation of capabilities during execve) but apps can't do that either because application code is executed by zygote with process control attribute NO_NEW_PRIVS, ignoring set-user-ID as well as file capabilities. SUID is also ignored by mounting /system and /data with nosuid option for all apps.

  • UID can be switched only if calling process has SETUID/SETGID capability in its Bounding set. But Android apps are made to run with all capabilities already dropped in all sets using process control attribute CAPBSET_DROP.

  • Starting with Oreo, apps' ability to change UID/GID has been further suppressed by blocking certain syscalls using seccomp filters.


Since the standalone su binaries stopped working with the release of Jelly Bean, a transition was made to su daemon mode. This daemon is launched during boot which handles all superuser requests made by applications when they execute the special su binary (1). install-recovery.sh (located under /system/bin/ or /system/etc/) which is executed by a pre-installed init service flash_recovery (useless for adventurers; updates recovery after an OTA installation) was used to launch this SU daemon on boot.


The next major challenge was faced when SELinux was set strictly enforcing with the release of Android 5.0. flash_recovery service was added to a restricted SELinux context: u:r:install_recovery:s0 which stopped the unadulterated access to system. Even the UID 0 was bound to perform a very limited set of tasks on device. So the only viable option was to start a new service with unrestricted SUPER CONTEXT by patching the SELinux policy. That's what was done (temporarily for Lollipop (2, 3) and then permanently for Marshmallow) and that's what Magisk does.


HOW MAGISK WORKS?


Flashing Magisk usually requires a device with unlocked bootloader so that boot.img could be dynamically modified from custom recovery (4) or a pre-modified boot.img (5) could be flashed/booted e.g. from fastboot.

As a side note, it's possible to start Magisk on a running ROM if you somehow get root privileges using some exploit in OS (6). However most of such security vulnerabilities have been fixed over time (7).

Also due to some vulnerabilities at SoC level (such as Qualcomm's EDL mode), locked bootloader can be hacked to load modified boot / recovery image breaking the Chain of Trust. However these are only exceptions.


Once the device boots from patched boot.img, a fully privileged Magisk daemon (with UID: 0, full capabilities and unrestricted SELinux context) runs from the very start of booting process. When an app needs root access, it executes Magisk's (/sbin/)su binary (worldly accessible by DAC and MAC) which doesn't change UID/GID on its own, but just connects to the daemon through a UNIX socket (8) and asks to provide the requesting app a root shell with all capabilities. In order to interact with user to grant/deny su requests from apps, the daemon is hooked with the Magisk Manager app that can display user interface prompts. A database (/data/adb/magisk.db) of granted/denied permissions is built by the daemon for future use.


Booting Process:

Android kernel starts init with SELinux in permissive mode on boot (with a few exceptions). init loads /sepolicy (or split policy) before starting any services/daemons/processes, sets it enforcing and then switches to its own context. From here afterwards, even init isn't allowed by policy to revert back to permissive mode (9, 10). Neither the policy can be modified even by root user (11). Therefore Magisk replaces /init file with a custom init which patches the SELinux policy rules with SUPER CONTEXT (u:r:magisk:s0) and defines the service to launch Magisk daemon with this context. Then the original init is executed to continue booting process (12).


Systemless Working:

Since the init file is built in boot.img, modifying it is unavoidable and /system modification becomes unnecessary. That's where the systemless term was coined (13, 14). Main concern was to make OTAs easier - re-flashing the boot image (and recovery) is less hassle than re-flashing system. Block-Based OTA on a modified /system partition will fail because it enables the use of dm-verity to cryptographically sign the system partition.


System-as-root:

On newer devices using system-as-root kernel doesn't load ramdisk from boot but from system. So [system.img]/init needs to be replaced with Magisk's init. Also Magisk modifies /init.rc and places its own files in /root and /sbin. It means system.img is to be modified, but Magisk's approach is not to touch system partition.


On A/B devices during normal boot skip_initramfs option is passed from bootloader in kernel cmdline as boot.img contains ramdisk for recovery. So Magisk patches kernel binary to always ignore skip_initramfs i.e. boot in recovery, and places Magisk init binary in recovery ramdisk inside boot.img. On boot when kernel boots to recovery, if there's no skip_initramfs i.e. user intentionally booted to recovery, then Magisk init simply executes recovery init. Otherwise system.img is mounted at /system_root by Magisk init, contents of ramdisk are then copied to / cleaning everything previously existing, files are added/modified in rootfs /, /system_root/system is bind-mounted to /system, and finally [/system]/init is executed (15, 16).


However things have again changed with Q, now /system is mounted at / but the files to be added/modified like /init, /init.rc and /sbin are overlaid with bind mounts (17).


On non-A/B system-as-root devices, Magisk needs to be installed to recovery ramdisk in order to retain systemless approach because boot.img contains no ramdisk (18).


Modules:

An additional benefit of systemless approach is the usage of Magisk Modules. If you want to place some binaries under /system/*bin/ or modify some configuration files (like hosts or dnsmasq.conf) or some libraries / framework files (such as required by mods like XPOSED) in /system or /vendor, you can do that without actually touching the partition by making use of Magic Mount (based on bind mounts). Magisk supports adding as well removing files by overlaying them.


MagiskHide: (19)

Another challenge was to hide the presence of Magisk so that apps won't be able to know if the device is rooted. Many apps don't like rooted devices and may stop working. Google was one of the major affectees, so they introduced SafetyNet as a part of Play Protect which runs as a GMS (Play Services) process and tells apps (including their own Google Pay) and hence their developers that the device is currently in a non-tampered state (20).


Rooting is one of the many possible tempered states, others being un-Verified Boot, unlocked bootloader, CTS non-certification, custom ROM, debuggable build, permissive SELinux, ADB turned on, some bad properties, presence of Lucky Patcher, Xposed etc. Magisk uses some tricks to make sure that most of these tests always pass, though apps can make use of other Android APIs or read some files directly. Some modules provide additional obfuscation.


Other than hiding its presence from Google's SafeyNet, Magisk also lets users hide root (su binary and any other Magisk related files) from any app, again making using of bind mounts and mount namespaces. For this, zygote has to be continuously watched for newly forked apps' VMs.


However it's a tough task to really hide rooted device from apps as new techniques evolve to detect Magisk's presence, mainly from /proc or other filesystems. So a number of quirks are done to properly support hiding modifications from detection. Magisk tries to remove all traces of its presence during booting process (21).




Magisk also supports:



That's a brief description of Magisk's currently offered features (AFAIK).




FURTHER READING:



Topics


2D Engines   3D Engines   9-Patch   Action Bars   Activities   ADB   Advertisements   Analytics   Animations   ANR   AOP   API   APK   APT   Architecture   Audio   Autocomplete   Background Processing   Backward Compatibility   Badges   Bar Codes   Benchmarking   Bitmaps   Bluetooth   Blur Effects   Bread Crumbs   BRMS   Browser Extensions   Build Systems   Bundles   Buttons   Caching   Camera   Canvas   Cards   Carousels   Changelog   Checkboxes   Cloud Storages   Color Analysis   Color Pickers   Colors   Comet/Push   Compass Sensors   Conferences   Content Providers   Continuous Integration   Crash Reports   Credit Cards   Credits   CSV   Curl/Flip   Data Binding   Data Generators   Data Structures   Database   Database Browsers   Date &   Debugging   Decompilers   Deep Links   Dependency Injections   Design   Design Patterns   Dex   Dialogs   Distributed Computing   Distribution Platforms   Download Managers   Drawables   Emoji   Emulators   EPUB   Equalizers &   Event Buses   Exception Handling   Face Recognition   Feedback &   File System   File/Directory   Fingerprint   Floating Action   Fonts   Forms   Fragments   FRP   FSM   Functional Programming   Gamepads   Games   Geocaching   Gestures   GIF   Glow Pad   Gradle Plugins   Graphics   Grid Views   Highlighting   HTML   HTTP Mocking   Icons   IDE   IDE Plugins   Image Croppers   Image Loaders   Image Pickers   Image Processing   Image Views   Instrumentation   Intents   Job Schedulers   JSON   Keyboard   Kotlin   Layouts   Library Demos   List View   List Views   Localization   Location   Lock Patterns   Logcat   Logging   Mails   Maps   Markdown   Mathematics   Maven Plugins   MBaaS   Media   Menus   Messaging   MIME   Mobile Web   Native Image   Navigation   NDK   Networking   NFC   NoSQL   Number Pickers   OAuth   Object Mocking   OCR Engines   OpenGL   ORM   Other Pickers   Parallax List   Parcelables   Particle Systems   Password Inputs   PDF   Permissions   Physics Engines   Platforms   Plugin Frameworks   Preferences   Progress Indicators   ProGuard   Properties   Protocol Buffer   Pull To   Purchases   Push/Pull   QR Codes   Quick Return   Radio Buttons   Range Bars   Ratings   Recycler Views   Resources   REST   Ripple Effects   RSS   Screenshots   Scripting   Scroll Views   SDK   Search Inputs   Security   Sensors   Services   Showcase Views   Signatures   Sliding Panels   Snackbars   SOAP   Social Networks   Spannable   Spinners   Splash Screens   SSH   Static Analysis   Status Bars   Styling   SVG   System   Tags   Task Managers   TDD &   Template Engines   Testing   Testing Tools   Text Formatting   Text Views   Text Watchers   Text-to   Toasts   Toolkits For   Tools   Tooltips   Trainings   TV   Twitter   Updaters   USB   User Stories   Utils   Validation   Video   View Adapters   View Pagers   Views   Watch Face   Wearable Data   Wearables   Weather   Web Tools   Web Views   WebRTC   WebSockets   Wheel Widgets   Wi-Fi   Widgets   Windows   Wizards   XML   XMPP   YAML   ZIP Codes