Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wishlist for Ciw 3.0 #206

Open
geraintpalmer opened this issue Jul 5, 2022 · 10 comments
Open

Wishlist for Ciw 3.0 #206

geraintpalmer opened this issue Jul 5, 2022 · 10 comments

Comments

@geraintpalmer
Copy link
Member

The possibility of a 3.0 release opens up some possibilities, allowing some big changes that are not back-compatible, and some smaller internal changes. Below is a list of possible features or changes that might appear. Please use the comments below to discuss.

  1. Remove ability to read in from yaml (reasoning: pyYaml does this for us and better)
  2. Remove ability to write results to file (reasoning: pandas does this for us and better. Ciw records are already lists of NamedTuples with are easily compatible with pandas)
  3. Baulking and rejection losses should recorded as DataRecords (exactly the same as how reneging and interrupted services are, for consistency)
  4. Classes should simply be indices (no need for the string "Class 1", just index by the number 1)?
  5. Add option for 'class_names' and 'node_names' in the Network object, so that the data records show these. This might make Ciw records more readable.
  6. Ensure consistent using of current_time / next_event_time internally (I think current_time is better)
  7. Rethink exact arithmetic (Decimal(3.2) + 4.1 = Decimal(7.3), so maybe no need for the increment_time thing
  8. (If possible) simplify import_params.py (especially in terms of routing functions)
  9. Replace ciw.dists.NoArrivals() with None (this is how reneging and dynamic classes handle no distributions)
  10. Make it easier to choose Individual attributes to include in the DataRecords, e.g Q = ciw.Simulation(N, attributes_to_record=['successful_service', 'age']) (see error rate #205 (comment) for how this is currently handled)
@geraintpalmer
Copy link
Member Author

@drvinceknight @11michalis11 Let me know your thoughts / suggestions.

@lec00q
Copy link

lec00q commented Jul 22, 2022

May I add a couple of ideas?

  1. Allow state-dependent reneging distributions
  2. Allow general distributions for baulking
  3. Allow to set "batching" at any node, that is, a mechanism to wait for a certain number of customers before being accepted/released, and accept/release all of them at the same time
  4. A "wait-to-be-pushed" mechanism, that is, a customer stays in service indefinitely until the queue is filled and one more customer wants to join the queue, therefore it "pushes" all others in the queue and also the one who was in service is released
  5. Parallel computations (which however is always tricky)

Hope it helps

@geraintpalmer
Copy link
Member Author

HI @lec00q thank you so much for the suggestions!

I believe points 2) and 5) can be done already in some way:

  • Baulking distributions are always user-defined, and they take in the number of customers already present. This means that users can define some hard-and-fast rules, or probabilistic distributions, or something more complex, to decide whether a customer will baulk. However I think this can be improved further by passing the simulation object itself to the baulking function, allowing the rules/distributions to use the full state of the network, rather than just number of customers present at the current node.
  • I don't think parallel processing a single run of the simulation would be possible, as the logic is highly sequential. However parallelising the trials can be done, for example using the `multiprocessing' library, or there are other solutions. I think a page in the documentation on how do this might be beneficial.

I love the idea of 3) and 4). Do you have any examples of this so that I can further understand what is meant here?

I think 1) is quite difficult. When implementing reneging we found it difficult to clearly well define a state-dependent reneging mechanism without falsely multiply sampling, and so changing the probability distributions. I would welcome a further discussion on this though.

@drvinceknight
Copy link
Contributor

  • I don't think parallel processing a single run of the simulation would be possible, as the logic is highly sequential. However parallelising the trials can be done, for example using the `multiprocessing' library, or there are other solutions. I think a page in the documentation on how do this might be beneficial.

I'm happy to PR this if you'd like me to @geraintpalmer

@geraintpalmer
Copy link
Member Author

@drvinceknight that would be fantastic thank you

drvinceknight added a commit to drvinceknight/Ciw that referenced this issue Jul 29, 2022
This was discussed at CiwPython#206.

Note that this adds a script to `docs/_static`. This is because I do not
believe the parallel processing can be doctested.
@drvinceknight
Copy link
Contributor

drvinceknight commented Jul 29, 2022

I've opened #209 with https://ciw--209.org.readthedocs.build/en/209/Guides/parallel_process.html

@galenseilis
Copy link
Contributor

galenseilis commented Oct 23, 2023

The possibility of a 3.0 release opens up some possibilities, allowing some big changes that are not back-compatible, and some smaller internal changes. Below is a list of possible features or changes that might appear. Please use the comments below to discuss.

  1. Remove ability to read in from yaml (reasoning: pyYaml does this for us and better)
  2. Remove ability to write results to file (reasoning: pandas does this for us and better. Ciw records are already lists of NamedTuples with are easily compatible with pandas)
  3. Baulking and rejection losses should recorded as DataRecords (exactly the same as how reneging and interrupted services are, for consistency)
  4. Classes should simply be indices (no need for the string "Class 1", just index by the number 1)?
  5. Add option for 'class_names' and 'node_names' in the Network object, so that the data records show these. This might make Ciw records more readable.
  6. Ensure consistent using of current_time / next_event_time internally (I think current_time is better)
  7. Rethink exact arithmetic (Decimal(3.2) + 4.1 = Decimal(7.3), so maybe no need for the increment_time thing
  8. (If possible) simplify import_params.py (especially in terms of routing functions)
  9. Replace ciw.dists.NoArrivals() with None (this is how reneging and dynamic classes handle no distributions)
  10. Make it easier to choose Individual attributes to include in the DataRecords, e.g Q = ciw.Simulation(N, attributes_to_record=['successful_service', 'age']) (see error rate #205 (comment) for how this is currently handled)

For # 4 we can actually use immutable types that have an ordering. See Can Ciw Use Tuples For Class IDs?. Personally, I like this flexibility.

@geraintpalmer
Copy link
Member Author

Thanks @galenseilis this is a nice idea. I initially thought to keep customer classes as strings or integers, in order to make the data records easier to read with pandas, and reading/writing from file. Do you see any issue with this?

@galenseilis
Copy link
Contributor

Thanks @galenseilis this is a nice idea. I initially thought to keep customer classes as strings or integers, in order to make the data records easier to read with pandas, and reading/writing from file. Do you see any issue with this?

I think that not making any further changes to customer classes is desirable for my use cases. Using strings or integer works in many cases, and it is also compatible with using tuples depending on the project. A column of tuples can be "exploded" into multiple columns using pandas, so I am not particularly concerned about that. Overall, I like the current state.

@galenseilis
Copy link
Contributor

Just an afterthought about Pandas & Ciw:

I think it is good to keep Pandas out of Ciw. Returning a dictionary as per the current state is good enough.

Partly because there are other dataframe tools (e.g. polars), but also because Pandas is not compatible with PyPy. On larger simulations PyPy can provide some easy wins on performance even if it means avoiding certain packages (e.g. SciPy, Pandas, SKLearn).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants