-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/etc/machine-id
should not be inherited from templates
#8833
Comments
I agree. Privacy in Qubes OS is very bad. |
This was already explained in the FAQ: what-about-privacy-in-non-whonix-qubes. Whonix provides a fixed machine-id for all users. Machine-id is one identifier of many, there are many ways to fingerprint a VM. If Qubes starts focusing on these aspects, it will be redoing work already made by Whonix and take time away from developers that could be focusing on security issues. |
I am reporting this for Qubes OS. I am also showing what the original creators of systemd explain about machine-id.
Just because Whonix devs think/say something, does not automatically mean it is irrevocable and absolute. Tails (AFAIK) uses volatile machine-id.
|
Tails is focused (mostly) about privacy too. Standard Qubes VMs are not - as @ben-grande explained above. That said, volatile machine-id will be a problem for StandaloneVM - where it should remain constant (and where also persistent journal makes sense). But everywhere else, indeed machine-id shared between AppVMs may be problematic. Maybe we can specify it via kernel cmdline ( |
Should journal be persistant in some places but not others? It’s not too hard to make TemplateBasedVMs have a persistent journal.
What about renaming a qube or restoring it from backup? |
Both (currently) will result in a fresh UUID. But given those are rare events, I don't think it's a huge issue in practice. |
Making journal persistent in TemplateBasedVMs may be useful in some cases too (but also, #830 ), but it isn't really topic of this issue. |
/etc/machine-id volatile
in non-standalone qubes
Standard Qubes VMs are not [focused on privacy]
It seems relevant to clarify some things, both for the sake of completeness and to avoid further confusion:
### Project focus
From the homepage of Tails:
"Activists use Tails to hide their identities, avoid censorship, and communicate securely."
From the homepage of Whonix:
"As handy as an app - delivering maximum anonymity and security."
"Whonix runs like an app inside your operating system - keeping you safe and anonymous."
So, Tails and Whonix are actually focused on anonymity, although they say "privacy".
### Privacy != Anonymity
Privacy is about data confidentiality. Anonymity (not having a name) is about hiding one's identity (in a way, meta-data confidentiality). The two things may be related but they are not equivalent.
Example 1: A bank account is private. It is not anonymous though. So, there is no goal to preserve anonymity during transactions.
Example 2: A whistle blower may need to be anonymous, although the result of his activity is public. The goal is to protect anonymity, not the privacy of the data.
Confidentiality is a component of data security:
https://en.wikipedia.org/wiki/Infosec#Confidentiality
and Qubes OS is security-focused. This makes it also confidentiality (privacy) focused. It provides actual mechanisms for securing it. Whonix does not provide that. It relies on existing Qube's and Tor's mechanisms for its goals and builds upon these existing systems.
### How other mentioned projects handle machine-id
* Whonix
By enforcing the same machine-id for every user, Whonix attempts to use a "hide in the crowd" approach (http://www.dds6qkxpwdeubwucdiaord2xgbbeyds25rbsgr73tbfpqpt4a6vjwsyd.onion/wiki/Protocol-Leak-Protection_and_Fingerprinting-Protection#Identifiers_Design_Goals), justifying it with:
1. "The Tor Project coined this `Anonymity Loves Company` (good web search term). Whonix attempts to be an extension of Tor. Therefore follows similar design principles."
2. Logic that quasi-identifiers (https://en.wikipedia.org/wiki/Quasi-identifier) (which they seem to call non-deterministic artifacts) can result in VM fingerprinting anyway.
There are several problems with that reasoning though:
1. Differential privacy is weak (https://en.wikipedia.org/wiki/Differential_privacy#Public_purpose_considerations), especially considering the obvious fact that Whonix users are a minority compared to all Tor users, compared to all other Internet users. I.e. hiding in a crowd makes sense only if the crowd is large enough.
2. The Whonix article says it is "realistically impossible" to disguise the fact that one is using Whonix. However, just because quasi-identification may be possible, does not mean it should be deliberately facilitated, neither it means that machine-id (which is considered confidential by design, especially in untrusted and networked environments), should be made deliberately public. This does not make the crowd larger.
In summary, neither the logic, nor the effect of it work for the actual project goal. This can be a long discussion and should be taken with Whonix devs. In Whonix forums there is at least one leading nowhere (http://forums.dds6qkxpwdeubwucdiaord2xgbbeyds25rbsgr73tbfpqpt4a6vjwsyd.onion/t/anonymize-etc-machine-id/7721).
* Tails
An 8-year-old open issue:
https://gitlab.tails.boum.org/tails/tails/-/issues/7100
without resolution. They also seem to imagine some crowd.
That pretty much summarizes the situation with these so called privacy focused projects.
Maybe we can specify it via kernel cmdline (`systemd.machine-id=`) based on VM's UUID property (which is guaranteed to be unique, yet persistent)?
Which particular Qubes OS goal requires machine-id persistence?
I have it volatile in my templates (and hence qubes), it doesn't seem to cause any problems whatsoever. It is also easy to do, as explained.
|
Well, it's clearly stated in the FAQ already...
For example I see some config files in user home are built based on machine-id (pulseaudio settings for example), if machine-id will change, those will a) not be correctly loaded and b) will accumulate in large number over time (for every machine-id). That's just one example, there are surely more. |
Well, it's clearly stated in the FAQ already...
What I explained is not stated in the FAQ. The FAQ section contains the same inaccurate implication that privacy and anonymity are the same thing. It also attempts to oppose privacy to security in one sentence, thus making it even more contradictory because confidentiality is essential part of security, while anonymity is not. This confusion is a separate issue itself.
For example I see some config files in user home are built based on machine-id (pulseaudio settings for example), if machine-id will change, those will a) not be correctly loaded and b) will accumulate in large number over time (for every machine-id).
Well, for pulseaudio I see those are generated on each machine-id change (i.e. on each reboot). Making ~/.config/pulse volatile solves b). I don't observe an issue with a), so persistence seems not required.
If machine-id is persistent in AppVM, that would make it persistent in disposables based on that AppVM, which would also contradict the paragraph quoted in the OP.
That's just one example, there are surely more.
Maybe we need to have a complete list to evaluate the actual effect of it.
We should probably note that machine-id is another systemd thing (by the "good" Red Hat who now explain us this anti-privacy feature should be confidential), so not using systemd would make it unnecessary. But I guess that's not an option.
|
Most likely because you don't change volume inside a qube. But users of sys-gui/sys-gui-gpu (which is a goal to make more common in further releases) will see it more commonly.
The method with using qube's UUID and kernel cmdline (or other way to transfer that UUID into machine-id) that won't be an issue, as each created disposable qube has fresh UUID.
Maybe. But IMO more productive approach is to focus on what machine-id should be, based on its specification. It specifies that:
which currently indeed is broken; but also it specifies that:
which would be broken if it's generated randomly on each start of a persistent qube (be it standalone, or template-based one). |
Most likely because you don't change volume inside a qube.
Yes.
But users of sys-gui/sys-gui-gpu (which is a goal to make more common in further releases) will see it more commonly.
I wish I could test this and provide feedback. Unfortunately, I am stuck with the GUI VM, unless someone explains how to proceed with this issue:
#8657
As for UUID, thanks for explaining. I think you are right. That would match better the way it is supposed to work and won't be an issue for disposables.
|
Still, it will cause stuff to break, which isn’t awesome. |
Preserving UUID across rename is probably fixable. For backup restore it's a bit more tricky (as you can restore a qube from a backup when having that qube present already; or restore it multiple times). But also, it isn't going to be much different from qube clone, which also would need to result in a new UUID (and to preserve the confidentiality of machine-id - machine-id too). |
/etc/machine-id volatile
in non-standalone qubes/etc/machine-id
should not be inherited from templates
Need to consider the threat model. Which software is reading
Once tracking software is locally running under the mentioned threat model, it is much better for users to at least use a VM. Otherwise the tracking software can read hardware information and even hardware serial numbers. If using Whonix, what is the least worse choice here? An Locally running tracking software can also trivially create is own locally unique identifier, generate a random number and write it to a file in the home folder to read it after reboot.
Right. Again, same for Windows, Debian, Qubes, Tails, Whonix, ... We will also probably need to define the scope of your feature request. Naturally users won't care about the implementation specifics such as "
This issue is unspecific to Qubes, Debian, Tails, Whonix, etc. To my knowledge, there are no operating system which offer such feature. Even the terminology is non-existing. The awareness of this issue is non-existing. So if someone wanted to make progress with this topic, they would need to find/invent terminology, explain the issue and then draft feature requests sending to various projects or founding / funding the development of projects working on this.
deliberately facilitated?
This is evidence how hard it is hard to find consensus for this topic. But please don't blame it on the pro privacy projects that tracking software is doing whatever it can to track users and that other upstream don't care about this issue. A mess created a thousands of people isn't trivially fixed by a handful of people. There is research for anonymity anonbib - Selected Papers in Anonymity, but at time of writing I don't think there is any research related to local code execution anti-fingerprinting. Hence, it is difficult to reason about these things.
How exactly does the shared How do you call Tor? Also a so called privacy focused project because Tor doesn't even implement any local code execution anti-tracking? Wondering under that viewpoint, do any real (not so called) privacy projects exist? If you look at the Whonix history... In summary... It was hard to have a VM that reliably routes all traffic over Tor. Whonix solves that. How that's not an improvement? And also doing a lot of other stuff that is doable. But now you're shifting the goal post. Now you want to include a threat model where tracking software is running with local code execution. And if that's not provided, you call it a "so called privacy focused project". Check out CPUID. How you'll fix at least that? Related link that were not referenced here yet: I interpret the Qubes FAQ, What about privacy in non-Whonix qubes? as preemptive rejection of such feature requests. It's not a stated project goal. It's even a deliberately excluded project goal. Qubes "only" wants to keep other VMs safe from each other. One VM where malware is running should be unable to read data from other VMs. Hence a compromised browsing VM cannot read the gpg private keys stored in a vault VM. Privacy what information locally running malware can gather by execution inside the VM is however not a stated project goal. And I don't blame Qubes for that. Knowing how ridiculously difficult (speak expensive) it would be to implement this, it seems only natural to exclude unrealistic goals. This isn't even a feature request so to speak. It's kinda a "project request". I don't see this happening. Except, perhaps "money talks". Maybe someone like Marek could estimate or at least guesstimate how much it would cost to implement any of the above mentioned features in work hours and/or monetary terms. But even asking for estimates might be unrealistic to expect an answer. The time needed to even do preliminary research and make the estimate for something also takes time. Which isn't likely if the end result is just finding that out with no further action realistically happening. Why did I say "realistically impossible"? Well, for this to happen what would technically happen is changing the source files on other people's computers. That's not something I can easily do. Where? In upstream projects such as Linux, virtualizers, Debian, perhaps systemd. But they don't particularity care about my opinion. And that is fine and to be expected. There's thousands of people who want various stuff for them. Basically asking them to spend their life time. All for free or even against payment. So this issue needs to be explained, and patches that are acceptable to upstream need to be written. That is a slow crawling process and might hit a wall at some point because upstream doesn't care about this issue. Them either not seeing it as an issue, not important enough issue or not realistic to solve issue. Maybe it could happen if a billionaire or millionaire such as Mark Shuttleworth showed up as he did when he founded Ubuntu with I don't know how many millions of USD. If that happens, yeah, maybe GNU Hurd or some other microkernel can be forked, getting a project goal enshrined of local code execution anti-fingerprinting, for Xen to add anti local fingerprinting etc. That's a very long shot and I find that unrealistic. Disclaimer: This is my own opinion only. Not speaking for Qubes. |
I think the main concern with this issue that could be Qubes-specific is how to remove the unique fingerprints in the templates so the AppVMs based on the same template won't be undoubtedly linked. Here is an example: Now to circumvent this I can do this: This was written without much thought so maybe this will still leave some shared fingerprints in the templates. |
Does removing the machine-id in any meaningful way make it more difficult to link VMs, through the template? The VMs would still be using the same root FS, the filenames and timestamps across the root FS is probably a unique fingering in itself. You would also have data generated by the template when installing or updating software like dpkg.log, which would share the same timestamps across all VMs using the same template. |
Removing just machine-id doesn't solve this issue of course.
If user doesn't make any changes to the default files in the template and only using system package manager to install/remove software in template then it's possible to use e.g.
The logging in templates will need to be disabled or stored in private storage as template /home directory. I don't know if it even possible to achieve this at all. Just some thoughts. |
To be clear, and also somehow echo what @adrelanos said: the privacy aspect of shared machine-id is not a focus for non-Whonix VMs. We are not going to duplicate the effort there. And also, just machine-id is a very small part, not even in top 10 (or top 100) things to avoid linking VMs based on the same template (and as Patrick said, accessing it requires local execution, at which point there are a lot more ways to fingerprint a template). In fact, having the same machine-id across several users (all using the same template version) might improve privacy... The reason why making machine-id unique is considered in Qubes OS at all, is because having it shared may break some applications that assume it is unique and persistent. Its documentation suggests it may be used to derive application-specific unique identifiers and there may be applications relying on this feature. Something like this happened before (although it was about MAC address, not machine-id). |
Re. pulseaudio, mentioned earlier (again from machine-id's docs):
"If a stable unique identifier that is tied to the machine is needed for some application, the machine ID or any part of it must not be used directly. Instead the machine ID should be hashed with a cryptographic, keyed hash function, using a fixed, application-specific key."
The fact that pulse audio uses it directly is an issue with pulseaudio, i.e. unlikely something Qubes OS is supposed to take care of. They know about it:
https://gitlab.freedesktop.org/pulseaudio/pulseaudio/-/issues/1123
That principle seems applicable to all other software, i.e. we probably don't need a complete list.
@adrelanos
My previous reply was not intended to offend anyone or to start an extraneous discussion. It was a response to others who mentioned Whonix.
To avoid further off-topic, I will answer only the machine-id related things you mention in Qubes OS context. If you would like to discuss other things, please link to a relevant thread and we can do that.
Which software is reading `/etc/machine-id` under which circumstances?
I can open file:///etc/machine-id in both Firefox (Fedora) and Tor Browser (Whonix) (under no special circumstances), which means
- it is not confidential
- it is exposed in untrusted (networked) environment
Whether a browser extension or other JS can access it is at the mercy of the browser (sandboxing). As for other software, I have not investigated. The file is user readable. Even if it wasn't, in Qubes OS root is passwordless by default.
Only locally running tracking software, which is either malware or software with anti-features.
Or non-malware downloading and running arbitrary code (AKA JavaScript) that exploits bugs/vulnerabilities.
Until local fingerprinting protection gets implemented, it's best to avoid running such software even inside VMs. Such a feature ever getting invented however I called "realistically impossible".
I would glad to know how to avoid running JavaScript in JS-dependent forum or bug tracker, as well as why a well-known privacy invasive technology is deliberately chosen for privacy focused projects. Please link to where I can learn how to do that.
Locally running tracking software can find out under which operating system it is running anyhow (same for Windows, Debian, Qubes, Tails, Whonix, ...). Hiding this is again, realistically impossible.
Just because it is possible for locally running sophisticated malware to detect the OS (or that it is running in a VM), does not mean that:
- all malware is sophisticated enough to do this
- the OS should simply deliver a ready-made boot-resistant identifier in a well-known place, so every software can simply read it
deliberately facilitated?
Yes. Using the same machine-id for all users facilitates detection that "It is a this-OS user", thus not requiring any additional detection mechanisms from potential malware. This makes it possible even for simplest malware to find out. Not having a persistent machine-id would at least make it more difficult, thus reducing the probability of easy fingerprinting.
deliberately public?
Whonix's machine-id is public info. Qubes templates are also publicly accessible.
How exactly does the shared `/etc/machine-id` lead back to your real identity?
By reducing the noise in the system. Volatile parameters increase noise, making system identification more difficult.
Now you want to include a threat model where tracking software is running with local code execution.
It is not that I want that. It just seems part of distrusting the infrastructure (which includes pretty much everything except Xen, dom0 and the distro itself).
All for free or even against payment.
The same applies to bug reporters. :)
|
Coming first to mind, + screen resolution:
How much percent certainty would be too much? 10%? 50%? I guess even a 10% certainty would be considered too much.
Would need to show the diff of the different VM images. Maybe using
That would be quite catastrophic.
That's already covered by "malware".
JS is pretty much off-topic. (If used for fingerprinting, that's remote fingerprinting, not local fingerprinting.) Here are some related links:
Which malware at all looks at What I mean to say, this is not a realistic threat model.
There are way too many of these places.
It would probably cost way less than 10000 USD to develop a library of reliable OS detection that can defeat superficial OS hiding attempts.
The simplest malware doesn't run on Linux, doesn't do OS detection or attempt do detect Qubes. More sophisticated malware might use anti VM to avoid detection and analysis. Sophisticated, tailored malware against Qubes would probably use something like
Not really as this isn't the canonical way to detect Qubes and there are many other more simple, common ways to detect Qubes. So if you want anti-Qubes detection feature, I suggest opening a separate ticket (if this ticket wasn't clear enough).
Since the canonical way to detect Qubes VM is quote "Check Unless Qubes chooses to implement anti-OS detection (which as seen in this ticket the answer apparently is "no"), I don't think it makes sense to modify |
JS is pretty much off-topic.
So are xrandr, diffoscope, Whonix and what not.
What I mean to say, this is not a realistic threat model.
When you introduce an avalanche of questions and someone spends time to answer them, after which you swiftly brush away observable verifiable facts as "not realistic", there isn't much to say further.
|
I am now convinced that it would be better to have:
reasons:
Should work. I looked up the manual just now. Seem pretty clear. https://www.freedesktop.org/software/systemd/man/latest/machine-id.html
This gives nice flexibility to use different IDs for App Qubes vs Template. On topic, (and please correct me if I am wrong):
|
I am now convinced that it would be better to have:
* A) `/etc/machine-id` in App Qube; different from
* B) `/etc/machine-id` in Template.
:)
On topic, (and please correct me if I am wrong):
The whole subject of fingerprinting is off-topic (although it is related and worth discussing separately). It is just too big to fit here (and implies many more issues). If you have a proper discussion thread about it, share a link to it. Maybe we can figure what can be improved.
|
I brought up that topic because you mentioned in the original post here in context of privacy but that is only relevant in case of local code execution. When considering that however that opens up the full blown local code execution anti-fingerprinting discussion. It however makes sense to ignore privacy in this ticket and only go for different machine IDs in Template vs App Qube for the purpose of not confusing systemd journal.
Right.
Right. Such features would be best if described more generally, more thoroughly, requested more directly.
That's for sure.
The ones I collected so far:
Other than that you could look at my feature requests descriptions on https://www.kicksecure.com/wiki/System_identity_camouflage, see what you agree with or not, rephrase and then post bug reports and/or feature requests against any responsible projects such as Linux, Xen, Debian, Fedora, ... Feature request against Kicksecure, Whonix: Not needed. Above forum threads could be considered the feature request and " Qubes feature request: After this ticket and
I don't think any more tickets would be promising, would just be kinda duplicates, but that's just my opinion, not speaking for Qubes. |
I brought up that topic because you mentioned in the original post here in context of privacy
I only mentioned that it has privacy implications, i.e. for potential consideration in a broader context.
but that is only relevant in case of local code execution.
I wonder why you keep talking about this as if non-local one exists. Whether a file is downloaded and run, or JS runs inside browser - it is the local CPU that runs it.
It however makes sense to ignore privacy in this ticket and only go for different machine IDs in Template vs App Qube for the purpose of not confusing systemd journal.
Agreed.
I will look at the links later. Thanks.
|
There's a strong and important boundary. It's true that the local CPU runs it but the difference is two different concepts:
JS run "remote" from a remote website in a local browser: Cannot read CPUID. JS run locally (Node.js): Can read CPUID. So local fingerprinting is a lot worse than remote fingerprinting, which is subject to browser restrictions. For example, at least there's no way to read CPUID remotely through a website (excluding vulnerabilities leading to remote code execution). |
Qubes OS release
4.1.2
Brief summary
Currently, all VMs based on a particular template inherit its
/etc/machine-id
, because it is persistent. This has privacy implications.From machine-id documentation:
"This ID uniquely identifies the host. It should be considered "confidential", and must not be exposed in untrusted environments, in particular on the network. If a stable unique identifier that is tied to the machine is needed for some application, the machine ID or any part of it must not be used directly."
Steps to reproduce
cat /etc/machine-id
in template and VMs using it.Expected behavior
Qubes OS's templates are essentially golden images. As also described in systemd's documentation, "each instance should automatically acquire its own identifying credentials on first boot", i.e.
/etc/machine-id
must not be shared across qubes.Actual behavior
All qubes based on a certain template have template's
/etc/machine-id
.A simple and effective solution is to run this in the template:
After that, on each boot, the VM will have a new unique machine-id.
The last command ensures that journal will be volatile too (thus, not exercise unnecessary writes to SSDs). Related issue:
#8832
The text was updated successfully, but these errors were encountered: