Making ExternalSpillableMap generic for any datatype #350

n3nash · 2018-03-15T07:22:29Z

- Introduced concept of converters to be able to serde generic datatype for SpillableMap
- Fixed/Added configs to Hoodie Configs
- Changed HoodieMergeHandle to start using SpillableMap

n3nash · 2018-03-15T07:23:26Z

hoodie-client/src/main/java/com/uber/hoodie/config/HoodieWriteConfig.java

@@ -457,6 +468,11 @@ public Builder withFinalizeWriteParallelism(int parallelism) {
      return this;
    }

+    public Builder withMaxMemoryPerPartitionMerge(long maxMemoryPerPartitionMerge) {


Calculate this based on executor memory and memory fraction.

n3nash · 2018-03-15T07:24:11Z

hoodie-client/src/main/java/com/uber/hoodie/config/HoodieWriteConfig.java

@@ -64,6 +64,9 @@
  private static final String DEFAULT_HOODIE_COPYONWRITE_USE_TEMP_FOLDER_MERGE = "false";
  private static final String FINALIZE_WRITE_PARALLELISM = "hoodie.finalize.write.parallelism";
  private static final String DEFAULT_FINALIZE_WRITE_PARALLELISM = DEFAULT_PARALLELISM;
+  private static final String MAX_SIZE_IN_MEMORY_FOR_MERGE_IN_BYTES_PROP = "hoodie.merge.spill.threshold";
+  // Default max memory size during hash-merge, excess spills to disk
+  private static final String DEFAULT_MAX_SIZE_IN_MEMORY_FOR_MERGE_IN_BYTES = String.valueOf(1024*1024*1024L); //1GB


Make it fraction instead of absolute value.

n3nash · 2018-03-15T07:26:43Z

hoodie-common/src/main/java/com/uber/hoodie/common/util/collection/ExternalSpillableMap.java

@@ -156,7 +159,8 @@ public R put(T key, R value) {
      if (this.currentInMemoryMapSize < maxInMemorySizeInBytes || inMemoryMap.containsKey(key)) {
        // Naive approach for now
        if (estimatedPayloadSize == 0) {
-          this.estimatedPayloadSize = SpillableMapUtils.computePayloadSize(value, schema);
+          this.estimatedPayloadSize = SpillableMapUtils.computePayloadSize(key, this.keyConverter) +


use SizeEstimator from spark.

n3nash · 2018-03-16T20:54:09Z

@vinothchandar please take a pass at this when you get a chance.

vinothchandar

Need to review tests & API usage more in detail. Will do in next pass.

vinothchandar · 2018-03-19T14:25:21Z

hoodie-common/src/main/java/com/uber/hoodie/common/util/collection/DiskBasedMap.java

 */
-final public class DiskBasedMap<T,R> implements Map<T,R> {
+final public class DiskBasedMap<T, R> implements Map<T, R> {

  // Stores the key and corresponding value's latest metadata spilled to disk
  final private Map<T, ValueMetadata> inMemoryMetadataOfSpilledData;


rename to simply valueMetadataMap ? Spilling vs in-memory is only at the ExternalSpillableMap level, correct? This whole class is just disk-based i.e spilling

vinothchandar · 2018-03-19T14:52:44Z

hoodie-common/src/main/java/com/uber/hoodie/common/util/collection/ExternalSpillableMap.java

- * writes.
- * @param <T>
- * @param <R>
+ * An external map that spills content to disk when there is insufficient space for it to grow. <p>


is this all resulting from auto-formatting on save?

It's resulting from the fact that earlier checkins were not following google style, now every check in does.

okay.. the checkstyle stuff will hopefully fix this for good.

vinothchandar · 2018-03-19T14:55:37Z

hoodie-common/src/main/java/com/uber/hoodie/common/util/collection/ExternalSpillableMap.java

          currentInMemoryMapSize += this.estimatedPayloadSize;
        }
        inMemoryMap.put(key, value);
      } else {
        diskBasedMap.put(key, value);
      }
      return value;
-    } catch(IOException io) {
-      throw new HoodieIOException("Unable to estimate size of payload", io);
+    } catch (Exception io) {


Exception e

removed exception

vinothchandar · 2018-03-19T15:02:35Z

hoodie-common/src/test/java/com/uber/hoodie/common/util/collection/TestDiskBasedMap.java

        throw new UncheckedIOException(io);
      }
    }
  }
+
+  @Test
+  public void testSizeEstimator() throws IOException, URISyntaxException {


have you done a micro benchmark for the sizeEstimator to see if it won't become a bottleneck, when called in a loop?

It's not called in a loop, it's just called once.

vinothchandar · 2018-03-19T21:18:33Z

docs/configurations.md

@@ -23,7 +23,9 @@ summary: "Here we list all possible configurations and what they mean"
    <span style="color:grey">Should HoodieWriteClient autoCommit after insert and upsert. The client can choose to turn off auto-commit and commit on a "defined success condition"</span>
    - [withAssumeDatePartitioning](#withAssumeDatePartitioning) (assumeDatePartitioning = false) <br/>
        <span style="color:grey">Should HoodieWriteClient assume the data is partitioned by dates, i.e three levels from base path. This is a stop-gap to support tables created by versions < 0.3.1. Will be removed eventually </span>
-
+    - [withMaxMemoryFractionPerPartitionMerge](#withMaxMemoryFractionPerPartitionMerge) (maxMemoryFractionPerPartitionMerge = 0.6) <br/>
+    <span style="color:grey">This fraction is multiplied with the spark.memory.fraction to get a final fraction of heap space to use during merge </span>


We should also document how this interplays with spark.memory.storageFraction i.e this + that <= 1.0, otherwise things will OOM

Actually, this is just defining what fraction of spark.memory.fraction is used for in-memory map in MergeHandle. So if the client uses sparkconfigs incorrectly, spark itself will cause issues before it hits the mergehandle code..

I still think its worth documenting..
http://spark.apache.org/docs/2.1.1/tuning.html#memory-management-overview

Usually, spark.memory.fraction - spark.memory.storageFraction is left for internal datastructures of spark. Heap - spark.memory.fraction is left for user data structures - in this case hoodie externalspillablemap..

I dont actually follow

Why we multiply this fraction against spark fraction instead of total heap space. By definition, we cannot dip into the spark fraction

How can the spark app be stable is spark fraction is 0.6 and this also takes 0.6 * 0.6 of heap, which will bring total usage to 0.6 + 0.36 = 0.96 or 96% full heap, which will keep gcing back to back..

The definitions in this link : https://spark.apache.org/docs/latest/configuration.html and the one you pasted (which is the description link in the above link) are a little misleading. I think there was an oversight on my part for choosing to multiple with spark.memory.fraction. Instead, I should multiply with 1 - spark.memory.fraction. It's more clear here : https://0x0fff.com/spark-memory-management/.
I don't want to rely on heap space to calculate the spillablemap size since the heap is actually not the real user memory left by spark memory model.
So my thought process is, executor.memory = heap. So as you increase executor.memory you increase the heap. We want our spillablemap memory size to grow accordingly, hence user.available.memory = executor.memory * (1 - spark.memory.fraction)
spillable.available.memory = user.available.memory*merge.memory.fraction.

@vinothchandar

spillable.available.memory = user.available.memory*merge.memory.fraction sure, this is what I was getting at in the comments above. This should be okay.

@n3nash lets reflect this discussion in the docs above? Multiplying by spark fraction is definitely misleading

vinothchandar · 2018-03-19T21:36:05Z

...mmon/src/main/java/com/uber/hoodie/common/util/collection/converter/DefaultKeyConverter.java

+
+  @Override
+  public byte[] getBytes(String s) {
+    return s.getBytes();


need to pay attention to the charset for consistency.. please have it use utf-8 explcitly..

good catch, done.

vinothchandar · 2018-03-19T21:36:30Z

...on/src/main/java/com/uber/hoodie/common/util/collection/converter/DefaultValueConverter.java

+/**
+ * A default converter implementation for HoodieRecord
+ */
+public class DefaultValueConverter<V> implements


rename to HoodieRecordConvertor ?

vinothchandar · 2018-03-19T21:37:28Z

...on/src/main/java/com/uber/hoodie/common/util/collection/converter/DefaultValueConverter.java

+      Field[] fields = clazz.getDeclaredFields();
+      Optional<Field> fieldWithSchema = Arrays.stream(fields)
+          .filter(field -> {
+            if (field.getType() == Schema.class) {


can we merge these if statements?

vinothchandar · 2018-03-19T21:37:49Z

...on/src/main/java/com/uber/hoodie/common/util/collection/converter/DefaultValueConverter.java

+    // TODO : Find a generic way to figure out the true size of the record in any scenario
+    long sizeOfRecord = ObjectSizeCalculator.getObjectSize(hoodieRecord);
+    long sizeOfSchema = ObjectSizeCalculator.getObjectSize(schema);
+    log.info("SizeOfRecord => " + sizeOfRecord);


single log statement.. also should this be info?

Yeah, INFO seems fine, it's only logged once right now since the size estimate is done once.

vinothchandar · 2018-03-19T21:38:28Z

hoodie-common/src/test/java/com/uber/hoodie/common/model/TestAvroPayload.java

+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+public class TestAvroPayload implements HoodieRecordPayload {


can we extend or reuse an existing Payload class?

Unfortunately not, this is a special payload implementation which holds only bytes.

rename : AvroBinaryTestPayload

vinothchandar

Looked at API changes.. LGTM

n3nash · 2018-03-21T01:05:28Z

@vinothchandar I've addressed your comments and left some more comments. Please take a pass. I'm going to do some microbenchmarking on the sizeEstimator in the meantime and report back

n3nash · 2018-03-22T07:54:11Z

The sizeEstimator, which is using twitter ObjectSizeCalculator, shows a min of 95 ms and max of 180 ms as below :

18/03/22 07:18:49 INFO collection.ExternalSpillableMap: Estimated Payload size => 1049008 Time taken 126
18/03/22 07:18:49 INFO collection.ExternalSpillableMap: Estimated Payload size => 1037889 Time taken 180
18/03/22 07:18:49 INFO collection.ExternalSpillableMap: Estimated Payload size => 1038006 Time taken 95

For now, I will live with calculating the payload size once at the beginning, if that shows problems will invest in a config based every N records size estimate.

n3nash · 2018-03-22T23:49:54Z

@vinothchandar I did a little more testing and realized it's really difficult to come up with a concrete object size for an entry in the hashmap in the JVM since objects can be shared etc. I'm thinking if spillableMap should be made config based for now ? Need to spend more time to understand how to do object sizing so that records don't spill to disk when they are not needed to. WDYT ?

vinothchandar · 2018-03-23T03:04:45Z

I'm thinking if spillableMap should be made config based for now
Meaning, you want to control if we use spillable map or totally in-memory hashmap?

I did a little more testing and realized it's really difficult to come up with a concrete object size for an entry in the hashmap in the JVM since objects can be shared etc.
Can we especially hard if we hold avro records with a shared schema object... Is this a new problem with the twitter estimator? we were using Spark's estimator in other places right..

Can you provide me more context in to whats going wrong

n3nash · 2018-03-23T04:03:42Z

All sizes in bytes.
Yeah, so say we have a Map[HoodieRecords], in most scenarios, what ends up happening is a HoodieRecord holds a payload which might in turn hold a GenericRecord. The size of such a HoodieRecord is 1049008. When I drill into the HoodieRecord, it turns out that schema is around 100000 and the rest is 49008. If you estimate the size of hoodie record, we would assume it to be 1049008. But actually, the schema object is shared amongst all payloads of all HoodieRecords, so ideally, the real payload size is 49008. I've been able to address this issue in the HoodieRecordConverter by performing some Class field look ups, you might have noticed this during your review, and subtracting the sizeOfSchema from the sizeOfRecord to get the actual HoodieRecord size in the heap, which is 49008. So all good uptill now but still a little hacky. During some more offline tests, I realized that for a more complex payload which hold nested schemas, I cannot do any such optimizations an hence end up over-estimating the size of the record. The downside of overestimating is we spillToDisk much sooner/often than we would have if we were using a regular HashMap<>.

@vinothchandar

vinothchandar · 2018-03-23T05:28:42Z

I think holding it serialized could address some of these concerns? Taking a page out of Spark's book, this is why serialized in-memory storage is better/more compact at more CPU cost.. is that an option?

vinothchandar · 2018-03-23T05:35:23Z

hoodie-client/src/main/java/com/uber/hoodie/config/HoodieMemoryConfig.java

+      return this;
+    }
+
+    // Dynamic calculation of maxMemory to use for merge


java doc style comments

vinothchandar · 2018-03-23T05:39:29Z

hoodie-client/src/main/java/com/uber/hoodie/config/HoodieMemoryConfig.java

+
+  // This fraction is multiplied with the spark.memory.fraction to get a final fraction of heap space to use during merge
+  // This makes it easier to scale this value as one increases the spark.executor.memory
+  public static final String MAX_MEMORY_FRACTION_FOR_MERGE_PROP = "hoodie.merge.memory.fraction";


rename to hoodie.memory.merge.fraction in sync with how other configs are named?

vinothchandar · 2018-03-23T05:39:46Z

hoodie-client/src/main/java/com/uber/hoodie/config/HoodieMemoryConfig.java

+  // This fraction is multiplied with the spark.memory.fraction to get a final fraction of heap space to use during merge
+  // This makes it easier to scale this value as one increases the spark.executor.memory
+  public static final String MAX_MEMORY_FRACTION_FOR_MERGE_PROP = "hoodie.merge.memory.fraction";
+  public static final String MAX_MEMORY_FOR_MERGE_PROP = "hoodie.max.merge.memory";


same here.. different name..

n3nash · 2018-03-25T18:15:35Z

@vinothchandar Found out a better (and simple) way to accurately size an entry in a map and works well for any type of payload. I've incorporated that and your comments, please take a pass.

n3nash · 2018-03-25T18:17:33Z

hoodie-common/src/main/java/com/uber/hoodie/common/util/collection/ExternalSpillableMap.java

+            keyConverter.sizeEstimate(key) + valueConverter.sizeEstimate(value);
+        log.info("Estimated Payload size => " + estimatedPayloadSize);
+      }
+      else if(shouldEstimatePayloadSize &&


@vinothchandar Simple handling of sizes.

how does estimating the size of entire map alleviate the issues you mentioned before?

The inherent problem lies in the fact that an entry in the Map by itself is not a good gauge of how large the record is in the heap (as discussed before mainly due to shared objects). What essentially is required is to estimate the size of a record given N records collectively, that way the size of the shared object is amortized over N records and we get as close to the actual payload size as possible. Here, estimating the size of the map does that for us. It's still not dead accurate but comes very close. I chose the number 100 after tying with a couple different payloads, nested and non-nested with shared objects and I saw the size of the shared object get amortized. We start with estimating the payload size for one record first, use that till we reach 100 records. We do this since we don't want to OOM given a small memory setting and large record. Then we update the payload estimate by calculating the size of the HashMap / N. We could choose a higher number say 1000 at which we want to re-estimate; this might give use better amortization but the fear is that because of overestimation we may already start spilling to disk by then.
The other option is to keep updating the estimate logarithmically, but my fear is for a huge HashMap, I'm not sure how the ObjectSizeEstimator performs in terms of CPU and Memory usage.

lets do a follow on task to fix this more systematically.. if 100 works for now, may be we can keep it as it is.

IIUC you are saying by estimating at a higher point in the object tree (graph), the sizing is more accurate? If so, then that makes sense..

Yes, that's what I'm saying.

vinothchandar · 2018-03-26T05:22:46Z

hoodie-common/src/main/java/com/uber/hoodie/common/util/collection/ExternalSpillableMap.java

+  // Value converter to convert value type to bytes
+  final private Converter<R> valueConverter;
+  // Find the actual estimated payload size after inserting N records
+  final private static int NUMBER_OF_RECORDS_TO_ESTIMATE_PAYLOAD_SIZE = 100;


nit: can you move static member to the top before any instance variables.

vinothchandar · 2018-03-26T05:28:42Z

hoodie-common/src/main/java/com/uber/hoodie/common/util/collection/ExternalSpillableMap.java

-      return value;
-    } catch(IOException io) {
-      throw new HoodieIOException("Unable to estimate size of payload", io);
+      if (!inMemoryMap.containsKey(key)) {


I think we are assuming records are of the same size here and updates need not adjust for size.. worthy to leave a TODO here to revisit..

Yes, we are assuming that, done.

vinothchandar · 2018-03-26T05:30:44Z

hoodie-common/src/main/java/com/uber/hoodie/common/util/collection/ExternalSpillableMap.java

+  // Find the actual estimated payload size after inserting N records
+  final private static int NUMBER_OF_RECORDS_TO_ESTIMATE_PAYLOAD_SIZE = 100;
+  // Flag to determine whether to stop re-estimating payload size
+  private boolean shouldEstimatePayloadSize = true;


Should we eliminate this flag and just reestimate the hashmap value every X records or so continuously? It can be non-linear probe as well.. 100, 1000, 10000, 100000 to amortize cost?

Yeah, I thought of that but I'm unsure of the performance of the ObjectSizeEstimator for such large objects, say a HashMap with 100K entries. Do you think we should performance test that now and do this continuously ?

vinothchandar · 2018-03-27T17:00:26Z

@n3nash we can move it back to IOHandle if that eases things. May be a static helper closer to the Map itself? since this math can be reused for compaction too? My original comment was on not having the a lot getter/setters for these spark props, which the localizing of variables addressed.

- Introduced concept of converters to be able to serde generic datatype for SpillableMap - Fixed/Added configs to Hoodie Configs - Changed HoodieMergeHandle to start using SpillableMap

n3nash · 2018-03-28T01:48:11Z

@vinothchandar Revised the PR. Unfortunately, cannot get spark defaults since they are hard-coded. I've added link to the places where they are hardcoded in the spark code in the comments. If there are more changes you propose, can we merge this and create a follow up ticket which I can take up in the following week ?

n3nash commented Mar 15, 2018

View reviewed changes

n3nash force-pushed the generic_spillable_map branch 10 times, most recently from e47bffe to 60a10db Compare March 16, 2018 20:53

n3nash changed the title ~~(WIP) Making ExternalSpillableMap generic for any datatype~~ Making ExternalSpillableMap generic for any datatype Mar 16, 2018

vinothchandar self-requested a review March 19, 2018 03:15

vinothchandar requested changes Mar 19, 2018

View reviewed changes

vinothchandar reviewed Mar 20, 2018

View reviewed changes

n3nash force-pushed the generic_spillable_map branch from 60a10db to 27bf9a1 Compare March 21, 2018 00:14

n3nash force-pushed the generic_spillable_map branch from 27bf9a1 to 7128aeb Compare March 21, 2018 06:21

n3nash force-pushed the generic_spillable_map branch from 7128aeb to 9955e5c Compare March 22, 2018 08:05

n3nash force-pushed the generic_spillable_map branch from 9955e5c to 47583a0 Compare March 23, 2018 04:16

vinothchandar reviewed Mar 23, 2018

View reviewed changes

n3nash force-pushed the generic_spillable_map branch 2 times, most recently from 5fb6a9d to 5011a19 Compare March 25, 2018 18:14

n3nash force-pushed the generic_spillable_map branch from 5011a19 to 4933211 Compare March 25, 2018 18:16

n3nash commented Mar 25, 2018

View reviewed changes

vinothchandar mentioned this pull request Mar 26, 2018

CodeStyle formatting to conform to basic Checkstyle rules. #360

Merged

vinothchandar reviewed Mar 26, 2018

View reviewed changes

n3nash force-pushed the generic_spillable_map branch 2 times, most recently from 7797177 to 3c5d672 Compare March 27, 2018 05:09

Making ExternalSpillableMap generic for any datatype

588b5ee

- Introduced concept of converters to be able to serde generic datatype for SpillableMap - Fixed/Added configs to Hoodie Configs - Changed HoodieMergeHandle to start using SpillableMap

n3nash force-pushed the generic_spillable_map branch from 3c5d672 to 588b5ee Compare March 28, 2018 01:46

vinothchandar merged commit 987f5d6 into apache:master Mar 28, 2018

vinishjail97 pushed a commit to vinishjail97/hudi that referenced this pull request Dec 15, 2023

[AUDIT-259] Upgrade to version release-v0.22.0 (apache#350)

d210867

Making ExternalSpillableMap generic for any datatype #350

Making ExternalSpillableMap generic for any datatype #350

Conversation

n3nash commented Mar 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

n3nash commented Mar 16, 2018

vinothchandar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vinothchandar left a comment

Choose a reason for hiding this comment

n3nash commented Mar 21, 2018

n3nash commented Mar 22, 2018

n3nash commented Mar 22, 2018

vinothchandar commented Mar 23, 2018

n3nash commented Mar 23, 2018 • edited Loading

vinothchandar commented Mar 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

n3nash commented Mar 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

n3nash Mar 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vinothchandar commented Mar 27, 2018

n3nash commented Mar 28, 2018

n3nash commented Mar 23, 2018 •

edited

Loading

n3nash Mar 26, 2018 •

edited

Loading