-
Notifications
You must be signed in to change notification settings - Fork 165
Memory Consumption
Loading all 71 language profiles uses 74MB ram to store the data in memory.
The more language profiles you load, the more memory is used. But careful, if you remove languages that might occur, you will get bad detection results.
The List<LanguageProfile>
uses 26.74MB ram. But that is an intermediate state only. Once the NgramFrequencyData
is created, the profiles can be garbage collected (don't keep a reference). Using String.intern()
when loading the language profiles reduces the size to 19.12 MB, but in the NgramFrequencyData
it makes no difference because every string (n-gram) exists only once in a HashMap
. So never mind that.
At runtime the memory is consumed by the NgramFrequencyData
class, the Map<String, double[]> wordLangProbMap
.
In a quick test I have replaced this map with a Map<String, float[]>
. That means 32 bit per float instead of 64 bit per double. 71 language profiles means the array has 71 values. With this, the memory consumption went down from 74.05 MB to 43.25 MB, that is 58% of the original only. All unit tests passed. Float would be precise enough. It just never was a consideration to save a few bytes (megabytes).
Another option is to use the Trove Java collections library instead of a JDK HashMap. Then a char[] could be used as map key instead of an expensive String. This means either including another library, or copy-pasting some code. See http://trove.starlight-systems.com/overview there is an example with char[]
and CharArrayStrategy implements TObjectHashingStrategy
.
Java 9 will reduce the amount of memory consumed by strings.
- Latin1 strings will use have the memory: http://openjdk.java.net/jeps/254
- Space improvement for interned strings, and sharing among multiple JVM instances: http://openjdk.java.net/jeps/250
- Then there's the feature for automatic string deduplication in G1: http://openjdk.java.net/jeps/192 and G1 will become the default garbage collector http://openjdk.java.net/jeps/248
I believe that for most users, this is not a consideration. Most apps run on servers with plenty of ram.
If you still get charged an arm and a leg for ram, you may want to consider Hetzner, a German host, my employer running http://www.nameapi.org/ is a satisfied customer.
For example this machine https://www.hetzner.de/de/hosting/produkte_rootserver/px61ssd 64 GB DDR4 ECC, Intel® Xeon® E3-1275 v5 Quad-Core Skylake, 2 x 480 GB SSD, for EUR 70 monthly. If you're an outside EU customer you get the 19% VAT deducted. No affiliation.
If you run the language detector on mobile devices, then you may want to look at the fork from user eclectice at https://github.com/eclectice/language-detector (gradle, short text profiles) or another version of the original software as Maven multi module project https://github.com/rmtheis/language-detection
I Fabian have no experience with Android.
Use the memory-measurer from https://github.com/DimitrisAndreou/memory-measurer
- download the object-explorer.jar
- add the jar to the language-detector software project in your IDE In IntelliJ: click File, Project structure, Libraries, green + sign, select the jar from your disk
- when running, add this to the VM options: -javaagent:/path/to/object-explorer.jar
This code loads all profiles, creates the NgramFrequencyData, then measures and prints.
import objectexplorer.MemoryMeasurer;
import objectexplorer.ObjectGraphMeasurer;
@Test
public void testMemory() throws IOException {
List<LanguageProfile> languageProfiles = new LanguageProfileReader().readAllBuiltIn();
NgramFrequencyData ngramFrequencyData = NgramFrequencyData.create(
languageProfiles,
NgramExtractors.standard().getGramLengths() //that is 1, 2 and 3-grams
);
assertEquals(languageProfiles.size(), 71);
int totalGramsAllProfiles = 0;
for (LanguageProfile languageProfile : languageProfiles) {
totalGramsAllProfiles += languageProfile.getNumGrams();
}
assertEquals(totalGramsAllProfiles, 281920);
measureAndPrint(languageProfiles);
measureAndPrint(ngramFrequencyData);
}
private void measureAndPrint(Object o) {
long memory = MemoryMeasurer.measureBytes(o);
ObjectGraphMeasurer.Footprint footprint = ObjectGraphMeasurer.measure(o);
System.out.println("Bytes: "+memory);
System.out.println("Kilobytes: "+String.format("%.2f", (memory/(double)1024)));
System.out.println("Megabytes: "+String.format("%.2f", (memory/(double)1024/1024)));
System.out.println(footprint);
}
The values I got on 2016-10-07 are:
Bytes: 77646712
Kilobytes: 75826.87
Megabytes: 74.05
Footprint{Objects=461627, References=723918, Primitives=[double x 8189850, char x 309517, int x 346202, float]}