Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile OASIS service for charge station based routing - Round 2 #349

Closed
CodeBear801 opened this issue May 19, 2020 · 11 comments · Fixed by #353
Closed

Profile OASIS service for charge station based routing - Round 2 #349

CodeBear801 opened this issue May 19, 2020 · 11 comments · Fixed by #353
Assignees
Labels
Prototype Proof of concept Refactor Rewrite existing code in order to improve its readability, reusability or structure Testing
Milestone

Comments

@CodeBear801
Copy link

CodeBear801 commented May 19, 2020

Subtask of #344, related with #345, next step is #352

Result Summary

Change Summary Worst Case System Time(seconds) Init Memory Init Time(seconds) More details
master(baseline) 49 3GB 30 #349 (comment)
Round 1's modification 38 3GB 30 #349 (comment)
Avoid object converting in getConnectivities and optimization inside that function 11 7.5GB 68 #349 (comment)
Avoid heavy structure conversion, tune hash table 5 7.5GB 68 #349 (comment), #349 (comment)

Test env: macpro(macpro 2013 32 core 64GB memory 1TB ssd)

Major target

  • Remove the time cost for generating temporary objects, move them to initialization
    • try to add a new step in ETL, input is current data, covert them directly into target data
    • need refactor weight part Retire multiple definition of weight in oasis #348
    • try to use slice + slice instead of map + slice, try to create logic node during initialization time
// QueryResult records topological query result
type QueryResult struct {
	StationID       string
	StationLocation *nav.Location
	Distance        float64
	Duration        float64
}

// QueryResult records topological query result
type QueryResult struct {
	StationID       string
	StationLocation *nav.Location
	Weight          *Wight
}

image
click here for full picture

m  map[nodeID]*queryHeapNodeInfo
m:  make(map[nodeID]*queryHeapNodeInfo, 15000), // change to slice?
@CodeBear801
Copy link
Author

Tricks for performance tune, change to a powerful server. It immediately jump to 10 seconds on macpro compare to 40~50 seconds on macbookpro.

Due to I increased the memory usage, its not sufficient to run benchmarks on 16GB of physical memory, I switched to a macpro.

Will do the following:

  • Adjust the initial benchmark to compare
  • Profile for the memory differences

@CodeBear801
Copy link
Author

CodeBear801 commented May 21, 2020

Update test base line due to switch benchmark device from macbookpro to macpro(macpro 2013 32 core 64GB memory 1TB ssd)

System time for base version

base version means original code from master.

Generate solutions for charge station based routing takes 49.735084 seconds.

While on macbookpro is 59~65 seconds.

cpu benchmark for base version

base version means original code from master.
cpu_old_05212020
Percentage is the same.

memory benchmark for base version

base version means original code from master.
mem_old_05212020
The same.

Init time for base version

Initialize OASIS resource manager takes 29.486075 seconds.

@CodeBear801
Copy link
Author

CodeBear801 commented May 21, 2020

Testing for Round 1's result on macpro, more info go to #345

System time

Generate solutions for charge station based routing takes 38.902622 seconds.

CPU profile

cpu_round1_05212020

Memory profile

No change

@CodeBear801
Copy link
Author

CodeBear801 commented May 21, 2020

Test based on commit 1328354 in task #348

System time

Generate solutions for charge station based routing takes 11.406948 seconds.

From 40 seconds to 11 seconds.

CPU profile

cpu_05212020

Memory profile

mem_05212020
memory increased from 3GB to 7.5GB

Init time

Initialize OASIS resource manager takes 68.845422 seconds.

increased from 30 seconds to 70 seconds

@CodeBear801
Copy link
Author

CodeBear801 commented May 21, 2020

Next step:

@CodeBear801
Copy link
Author

CodeBear801 commented May 22, 2020

Finally, I got a slim version of curve

This profile based on commit: 6bc2486, 3068da0, d0f2f94. More details could go to: #349 (comment)

System time

Generate solutions for charge station based routing takes 5.234221 seconds.

From 11 seconds to 5.2 seconds.

Summary

For the result, I think I could make 3.5 seconds of http handling time fall into 1 seconds by

  • pre-process data, provide functionality of getting place to place weight directly.

For the initialization part, it could be optimized by

  • pre-process data, not directly dump go's map by gob, but using slice to record data, I think it will make init time around 2 ~ 5 seconds and memory taken is about 1~2GB

But I will take other tasks first, and more importantly, I feel I need make overall structure more clear. During my improvement, the unit test helps a lot, it gave me the ability to make changes confidently. But I also notice that a lot of changes are applied into unit test, need more think about how to make a cleaner structure.

CPU profile

cpu_05222020

@CodeBear801 CodeBear801 self-assigned this May 22, 2020
@CodeBear801 CodeBear801 added Prototype Proof of concept Refactor Rewrite existing code in order to improve its readability, reusability or structure Testing labels May 22, 2020
@CodeBear801 CodeBear801 added this to the v10.4.0 milestone May 22, 2020
@wangyoucao577
Copy link

The summary in #349 (comment) show significant improvement. However, it's a little bit difficult to me to figure out what really changed affect the performance from both the change summary and commit. Will you please consider to re-orignaize and describe them in a read friendly way? That would be helpful for other guys too. Haha~

@CodeBear801
Copy link
Author

The summary in #349 (comment) show significant improvement. However, it's a little bit difficult to me to figure out what really changed affect the performance from both the change summary and commit. Will you please consider to re-orignaize and describe them in a read friendly way? That would be helpful for other guys too. Haha~

Too many changes in unit test, but I do need to summary it more clearly.

@CodeBear801
Copy link
Author

CodeBear801 commented May 26, 2020

image

part 1 (#1, #2,#6)

#1 and #2 are related with golang map, you could see from map_access1_fast64 and map_assign_fast64. In the code, they related with this map

type edgeID2EdgeData map[edgeID]*edgeMetric

type edgeID struct {
	fromNodeID nodeID
	toNodeID   nodeID
}

type edgeMetric struct {
	distance float64
	duration float64
}

Based on profile, I found this hash map is large, with about 60M elements. Part of the reason for why so large is, I create virtual nodes for each charge station, for example, station A might have several virtual nodes indicate user charge for certain amount of energy in a charge station.

station_1 -> station_3: distance = 10, duration = 10

// vs

station_1_1 -> station_3_1: distance = 10, duration = 10
station_1_1 -> station_3_2: distance = 10, duration = 10
station_1_1 -> station_3_3: distance = 10, duration = 10

station_1_2 -> station_3_1: distance = 10, duration = 10
station_1_2 -> station_3_2: distance = 10, duration = 10
station_1_2 -> station_3_3: distance = 10, duration = 10

station_1_3 -> station_3_1: distance = 10, duration = 10
station_1_3 -> station_3_2: distance = 10, duration = 10
station_1_3 -> station_3_3: distance = 10, duration = 10

Basically, I just want to query for pre-calculated station to station weight. It would be much better if I create such map during initialization for all station-to-station. The optimization I used for now is, convert virtual node to original charge station ID to do the query, and to support such change, I used slice to support internal nodeid to external placeid instead of map, which removes #6.

So no matter station_1_* to station_3_*, I only use the original ID in the hash map, besides, I also try not to create new object edgeMetric. The changes are inside of d0f2f94

After we have discussion related with #352 , if we add new interface like below and generate related data structure during initialization, then 5 seconds could drop to 3~4 seconds

GetWeight(from PlaceID, to PlaceID) *Weight

part 2 (#3)

For #3, you could find GetLocation takes effort in CreateLogicalNode, originally, I try to isolate between packages so I create different structures for same target, obviously, its ok as long as it won't affect performance. So I use the same nav.Location object definition and try to directly return object to caller.
The changes are inside 6bc2486

part 3 (#4)

For #4, surprisingly, I noticed that string to int conversion cause lots of efforts, so I try to make the flow in query time all use int, this is the change in 3068da0
Later, if we want string(), then I will provide a interface to convert from integer to string

part 4 (#5)

Based on changes in #3 and #4, I adjust the interface to avoid any object allocation during connectivity query, that removes the performance cost in #5

@wangyoucao577
Copy link

Try to summary the improvements:

  • Reduced map size;
  • Used slice instead of map when possible to improve querying performance;
  • Removed unncessarily mass object transforming(i.e., use same *nav.Location);
  • Removed unncessarily mass string to int parsing;
  • Overall reduced a lot of unncessarily object allocation.

Did I missed anything?

@CodeBear801
Copy link
Author

CodeBear801 commented May 28, 2020

Try to summary the improvements:

  • Reduced map size;
  • Used slice instead of map when possible to improve querying performance;
  • Removed unnecessarily mass object transforming(i.e., use same *nav.Location);
  • Removed unnecessarily mass string to int parsing;
  • Overall reduced a lot of unnecessarily object allocation.

Did I missed anything?

That's all the changes.
From past week's performance tune, I gave another vote for following development patten

  • Make app have correct result and reasonable structure for the first step, functionality be guaranteed by tests. Prefer Clear via Performance
  • Profile software to find bottleneck, make adjustment
  • Refactor structure, make structure more clear
  • another loop for new features...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Prototype Proof of concept Refactor Rewrite existing code in order to improve its readability, reusability or structure Testing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants