You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I read the Scalability entry, and it's a good post. I'd add a couple more caveats (discussed briefly in the article). Not all "big data" scalability problems are built around scaling out the number of CPU cores; I've worked in "big data" scaling on Spark before and often built out clusters for 10,000-100,000 times the dataset size of the one on McSherry's laptop. The calculus for these sorts of systems starts to tip back towards "the cluster's better" fairly quickly when you're also dealing with bus and memory bounds (do you have enough memory to hold the data you need in-memory, plus room to receive shuffles? Do you have a local network/NICs that are adequate to run those shuffles in reasonable time? Do you have enough striped fast storage?)
I'd add the 1G (still fairly large, sure) dataset size to the Shower part and explain that this is heavily a warning against overengineering and premature optimization.
The text was updated successfully, but these errors were encountered:
I read the Scalability entry, and it's a good post. I'd add a couple more caveats (discussed briefly in the article). Not all "big data" scalability problems are built around scaling out the number of CPU cores; I've worked in "big data" scaling on Spark before and often built out clusters for 10,000-100,000 times the dataset size of the one on McSherry's laptop. The calculus for these sorts of systems starts to tip back towards "the cluster's better" fairly quickly when you're also dealing with bus and memory bounds (do you have enough memory to hold the data you need in-memory, plus room to receive shuffles? Do you have a local network/NICs that are adequate to run those shuffles in reasonable time? Do you have enough striped fast storage?)
I'd add the 1G (still fairly large, sure) dataset size to the Shower part and explain that this is heavily a warning against overengineering and premature optimization.
The text was updated successfully, but these errors were encountered: