tipuesearch_content.json

{"pages":[{"text":"I am a Physics PhD candidate at UC Davis in observational cosmology . Contrary to popular belief that I have to climb up to big telescopes to get data, I code to collect data about my astronomical targets and analyze the data to test our understanding of 'cosmological' stuff like dark matter / cosmological parameters . (Image credit: Jee et al. 2014, NASA, ESA) \"El Gordo\" - the heavy galaxy cluster that I studied. We have used the false color in blue to represent the distribution of dark matter, and red to represent the intercluster gas. My blog here is an effort for me to communicate and discuss my work flow with people from within and outside of (astro)physics. I like putting together and using reproducible research tools as I believe they can make the scientific process more transparent. Here I will put up non-technical and technical blog posts about: cosmology / astrophysics, data analysis (Bayesian statistics and Machine Learning), computing, with a theme about solving problems with different research tools / programming languages.","tags":"pages","loc":"https://karenyyng.github.io/pages/about.html","title":"About"},{"text":"I am a Physics PhD candidate. My main thesis is about analyzing astronomical data. Check out my comprehensive Data Science skillset from my portfolio below: Data presentation skills Visualization skills link to accurate scientific visualization of telescope data This website is written in markdown and parsed to a working website with Python Pelican Software engineering performance profiling link1 to the different scaling plots from performance profiling (in Python ) for reading 51 GB of CSV files in parallel link2 to the single threaded version code for reading the same 51 GB of CSV files with three approaches, 1) Python Pandas , 2) R with Bash and 3) the R package FastCSVSample link3 to a visualiation of the log file from the Intel Vtune profiler tool. I was able to get rid of a performance bottleneck using the information of the visualization and speeding up the code by at least 10 times. Statistical and Machine Learning skills Clustering link to Gaussian mixture clustering for separating different populations of data (signal vs noise from foreground and background) Regression - optimization link to a custom-built Gaussian process for to encapsulate the properties of the problem setting. The code was written with Python, C++ and Cython . The parallelization was achieved with OpenMP and Intel Math Kernel Library . Gaussian Process is a powerful, flexible statistical algorithm. An important use case of Gaussian Process is for tuning the hyperparameters of Machine Learning algorithms such as neural networks ( link to a related NIPS publication). Model comparison / hypothesis testing link to model comparison by computing the Bayes factor link to determining model complexity of a Gaussian Mixture Model Data query (SQL) / wrangling / scraping link to an add-on to a Supercomputer Package (Spack) for querying the https://pypi.python.org/pypi Python package indices and parsing the installation recipes for Python packages link to a Python script that queries a MySQL database and returns a Pandas dataframe PhD project 1 Link to GitHub repository understanding the dynamics of a merging galaxy cluster Accurate scientific visualization 1 Fig.1 Accurate scientific visualization of the galaxy cluster El Gordo by overlaying inferred information on Hubble Space Telescope data. The two components of Dark Matter are marked by the blue crosses ( northwest (NW) and southeast (SE) ). I have also added contours of the gas component from the X-ray data in red. The white dashed lines indicate the locations where shockwaves (relics) were detected in the radio wavelengths. All these observables help confirm that the components of the galaxy cluster went through a merger. By making use of the merger shockwave observation in the radio wavelengths, we show that two main dark matter components of this particular galaxy cluster are likely to be moving towards one another for a second merger (the returning scenario is more likely than the outgoing scenario). Fig. 2 of my paper - Illustration of the spatial location of different components of the merger cluster El Gordo at different stages of the merger. Fig. 3 the marginal distribution of the estimated time-since-pericenter (TSP) for the two merger scenarios ( ret for returning and out for outgoing) and the periodicity T. The shaded region are the 68% and the 95% confidence intervals for the 1D marginal distribution on the diagonal subplots in the triangular plots. Fig. 4 Visualization of the conditional probability distribution of the possible shockwave (relic) location vs the observed location. The observation in red favors the returning scenario. The published version of my paper titled 'The return of the merging galaxy cluster of El Gordo?' can be found on the journal website here . A free version of my published paper can be found on ArXiV . Hack session analysis from the unconference AstroHack week 2014 Goal: separate the signal from foreground and background using Gaussian Mixture Model Original smoothed (discrete) galaxy data that is contaminated by foreground and background noise. The Gaussian mixture model (GMM) that separates the data into 3 population, i.e. signal (red), background (blue) and other noise (green). The left plot is the shape of the actual mixture, the right panel shows the feature space that has physical meaning. The \"signal\" cluster in red from the 3-component GMM model still seems contaminated. The Bayesian information criteria (BIC) uses both the likelihood and the model complexity to measure how many components of the Gaussian Mixture Model (GMM) would best explain the data. The lower the score, the better the model is. The spatial distribution inferred by the 7-component Gaussian Mixture Model is tighter and has lower false positive rate. PhD project 2 (ongoing work) using gaussian processes for modeling cosmic shear Dark matter spans our universe like a cosmic web, gravitationally deflecting the paths of light from background galaxies. This work is the first to model the dark matter density with a Gaussian Process (from which gaussian random fields are drawn). I am employed as a graduate research assistant for 6 months at the National Energy Research Scientific Computing (NERSC), Lawrence Berkeley National Laboratory for this project. We are actively working on innovative approaches to scale up the computation. My first publication of this project is in preparation. PhD project 3 (ongoing work) characterizing galaxy- dark matter offset in galaxy clusters / groups in the Illustris simulation We aim to answer some urgent questions about the ability of merging clusters for constraining the self-interaction properties of dark matter with the data from the Illustris simulation. Illustris simulation is one of the first cosmological simulations (think: a big box of the universe modeled by a computer) with realistic galaxy / star formation . Since Illustris simulation was produced without including any self-interaction of dark matter particles, it provides a excellent baseline for understanding the bias / variance of our statistical techniques for finding offsets between dark matter and the galaxies in a galaxy cluster. To get a sense of the relative scale of the galaxy clusters that I study, you can watch the first 15 mins of my public talk . Or you can look at the pretty pictures of the Illustris simulation . Other projects that resulted in publications 1. MC&#94;2: Galaxy Imaging and Redshift Analysis of the Merging Cluster CIZA J2242.8+5301 William A. Dawson, M. James Jee, Andra Stroe, Y. Karen Ng , Nathan Golovich, David Wittman, David Sobral M. Bruggen, H. J. A. Rottgering, R. J. van Weeren ArXiv preprint Astrophysical Journal 805 143 2. Weighing \" El Gordo \" With A Precision Scale: Hubble Space Telescope Weak-Lensing Analysis of the Merging Galaxy Cluster ACT-CL J0102-4915 At z=0.87 M. James Jee, John P. Hughes, Felipe Menanteau, Cristobal Sifon, Rachel Mandelbaum, L. Felipe Barrientos, Leopoldo Infante, and Karen Y. Ng , 2014, ApJ, 785, 20, 1309.5097, doi:10.1088/0004-637X/785/1/20","tags":"pages","loc":"https://karenyyng.github.io/pages/projects.html","title":"Projects"},{"text":"Capturing software builds with Docker April 14, 2016 I learned about Docker for one of my thesis projects. Docker is such a wonderful tool for reproducible research and deployment that I volunteered to give a talk at the the Hackerwithin Davis Chapter monthly gathering. See the descriptions at the Hackerwithin Davis chapter website Later in 05/16/2016, the Linux User Group of Davis (LUGOD) also invited me to give the same talk . The slides are available here . The return of the merging galaxy subclusters of El Gordo? Mar 2015 This is a more updated talk on my paper at the SnowCluster conference dedicated for the study of galaxy clusters . The PDF link can be found on the SnowCluster website. Oct 2014 This is a technical talk on my paper given to my collaborator Prof. Lars Hernquist and the galaxy cluster group at the Harvard-Smithsonian Center For Astrophysics (CfA). San Francisco Amateur Astronomers - invited talk 08/20/2014 I was invited to give a public talk about my work on the merging galaxy cluster El Gordo after the press release of the mass measurement work was released. I gave a general background introduction of a galaxy cluster and talked about the anatomy of a galaxy cluster . Link to abstract Slides with video recording","tags":"pages","loc":"https://karenyyng.github.io/pages/talks.html","title":"Talks"},{"text":"Observation trip at the Keck observatory in Hawaii Dec 2013 Me on the top of Mauna Kea after my observation run at the Keck (remote) observatory control room. You can see that there are several telescope domes behind me, with the leftmost one being the Subaru telescope, and the rest are the Keck 1 and Keck 2 telescopes. I know people love to think that I (and other astronomers) had to climb up to a control room and use a joystick to control the telescope ... The truth is ... I just had to code to plan my observation and fill out forms to let the telescope operator help me... most of the observation was automated or else I might destroy the telescope ;) The \"masks\" (the metal plates with holes) that I designed for only allowing target light to reach our detector on the Keck 2 telescope. Lick Observatory workshop Oct 2012","tags":"pages","loc":"https://karenyyng.github.io/pages/trips.html","title":"Trips"},{"text":"Table Of Content for post Skip to the relevant sections if needed. Concepts for resolving Git conflicts Setting up different editors / tool for using git mergetool Finding out what mergetool editors are supported mergetool simple code example for vimdiff Other great references and tutorials Concepts for resolving Git conflicts For using mergetool in git , we need to understand the following terminology to understand what is being merged: LOCAL - the file(s) from the current branch on the machine that you are using. REMOTE - the files(s) from a remote location that you are trying to merge into your LOCAL branch. BASE - the common ancestor(s) of LOCAL and BASE . MERGED - the tag / HEAD object after the merge - this is saved as a new commit. Common mergetool from editors will display both LOCAL and REMOTE so you can decide which changes to keep. Setting up different editors / tool for using git mergetool We have to change the git config to set a default mergetool. In this example, we will use vimdiff : $ git config merge.tool vimdiff We can also set the editor to display the common ancestor BASE while we examine what changes are in LOCAL and REMOTE with the following setting: $ git config merge.conflictstyle diff3 back to top Finding out what mergetool editors are supported $ git mergetool --tool-help And we list a few of them: Command line mergetool editors Emacs based diff tools: emerge , or Ediff Vim based diff tool: vimdiff GUI mergetool editors gvimdiff - almost identical to vimdiff but uses the Linux GUI for Vim , please refer to vimdiff if you still use the keyboard commands for GVim . kdiff3 meld tortoisemerge Or consult the community of your favorite editor to see how to do the equivalent operations for your editor. Other useful mergetool settings Do not prompt before launching the merge resolution tool $ git config mergetool.prompt false back to top mergetool simple code example Ref1 for the example Ref2 creating the git repo $ mkdir galaxyZoo $ cd galaxyZoo $ git init $ vim astrophy_obj.txt Add some galaxy types into astrophy_obj.txt then save the file. # content of astrophy_obj.txt spiral ellipitcal bar irregular save then commit the file. $ git add astrophy_obj.txt $ git commit -m 'Initial commit' $ git branch astrophy_objects # create a new branch $ git checkout astrophy_objects # change to new branch $ vim astrophy_obj.txt # make changes to file Change bar to barred in the file. $ git commit -am 'changed bar to barred' $ git checkout master # change back to master branch $ vim astrophy_obj.txt # add the word `galaxy` to the end of each line using Vim REGEX # type `:%s/$/ galaxy/g` in Vim then press enter and save `:wq` $ git commit -am 'added galaxy to each line' # merge from the astrophy_objects branch to current branch, i.e. master $ git merge astrophy_objects Then you will see some error messages: Auto-merging astrophy_obj.txt CONFLICT (content): Merge conflict in astrophy_obj.txt Automatic merge failed; fix conflicts and then commit the result. We can bring up the mergetool : $ git mergetool Then it will bring up the different versions of the file in different Vim splits panels. +--------------------------------+ | LOCAL | BASE | REMOTE | +--------------------------------+ | MERGED | +--------------------------------+ The top left split panel is the LOCAL , top middle split is BASE and top right split is REMOTE . The bottom split refers to the MERGED version. You can find this info in the bottom bar of each split (I have put 3 yellow rectangles to highlight that info). As you can see form the below image, my Vim has highlighted the differences in red for me. Now if your terminal has any GUI capability and you have compiled Vim correctly with GUI support, you can use your mouse to click on the bottom split to edit it. Or if you are a Vim ninja, you can use the keyboard shortcut to move to different splits. Ctrl w + h # move to the split on the left Ctrl w + j # move to the split below Ctrl w + k # move to the split on top Ctrl w + l # move to the split on the right You can either incorporate the changes by manually editing the MERGED split, or use Vim shortcuts pull from one of the LOCAL , BASE ad REMOTE versions. :diffg RE # get from REMOTE :diffg BA # get from BASE :diffg LO # get from LOCAL save the changes then quit with :wqa to close all the splits. Remember to commit the merge. $ git commit -am 'merged from several branches' Other tips: if you were trying to do a git pull when you ran into merge conflicts, issue $ git rebase –continue Hooray now you can claim that you can collaborate with others with Git without messing up with your collaborators' commits. back to top Other vimdiff keyboard shortcuts ]c - Jump to the next change. [c - Jump to the previous change. ref Other great references and tutorials git mergetool page on git-scm.com tutorial about the concepts of branching and merging from Charles Duan more on vimdiff as a git mergetool back to top","tags":"articles","loc":"https://karenyyng.github.io/posts/git_mergetool_tutorial.html","title":"Git mergetool tutorial"},{"text":"I have been making incremental improvements to the blog, including: slight stylistic CSS change (finally know which CSS to fix) fixing jQuery plugin for returning search results properly When there is enough material I will blog about how to customize different parts of this statically generated blog using Pelican . Learning schedule Out of all the things that I wish to learn in grad. school (besides Physics and Cosmology obviously), Bayesian stat. and PGMs are some things that I have wanted to learn about for a long time. I believe a better understanding of probabilistic models / stat. concepts will aid my data analysis tremendously. Therefore, I have stopped giving myself excuses and started working on the Coursera material on Probabilistic Graphical Model (PGM) . Related reading that I assigned myself include: which is the course textbook. which is one of my favorite textbooks with really concrete examples and accompanying code BRMLToolKit which is written in Matlab . As programming exercises, I have been trying to rewrite some of the code in Python here . I am curious to see how good the Python package libpgm is which is used in this book. Now I don't know how good the statistics / probability part of the book is but I am willing to give the python package a try without reinventing everything from scratch if it is good. Finally, since most of the books above contain mostly machine learning / probability concepts, I am adding one last book to the list to gain more perspective from the statistics side of things. It is written by one of my favorite (Bayesian) statisticians Andrew Gelman. There are other things that I wish to do later in grad. school such as actually practise doing data analysis with Kaggle competitions but I should focus! I am giving myself half a year's time from now just to learn as much about PGM / Bayesian stat. related stuff as possible before diving into Kaggle.","tags":"articles","loc":"https://karenyyng.github.io/posts/learning_schedule_mid2015.html","title":"2015 mid-year status and goals"},{"text":"Why virtualenv It's a tragedy whenever a code breaks after a software \"update\". I have recently started using virtualenv for python to prevent my macport / apt-get updates from messing up the code of my research projects. Additionally, by providing a list of python package version, it can enhance reproducibility. Before I forget how to set it up correctly, I should write about it. Prerequisites I will use python 2.7 and assume the use of macport for this post. You should also have py27-virtualenv pip installed, either with pip or macport or apt-get or if you like, from source. I think you also can replace pip with easy_install if you prefer easy_install instead. I also assume you know some basic shell commands. How to create virtualenv If you just want to use virtualenv for the same version python as the system wide version, at a terminal, call the executable of virtualenv: $ virtualenv-27 --no-site-packages ENVNAME The command could also be virtualenv for some people. I assumed that virtualenv was installed using macport creating virtualenv this way will instruct that the paths inside the virtual env not to include path the system wide python packages (using –no-site-packages seems to be the default behavior for virtualenv but let me just mention that) After this command, whatever python packages installed to the PATH variable in the shell configuration file (e.g. .bashrc / .bash_profile) will be ignored when the virtualenv is activated. Caveats Note if you have python packages path set as part of PYTHONPATH , your virtualenv will not work properly so you will want to remove any system wide python packages from your PYTHONPATH . Once the command above is done executing, a folder with ENVNAME will be created that has the structure for installing its own set of python packages. $ ls ENVNAME will show you the following directory structure bin include lib share Next, you want to figure out what packages to install and what not to install. These packages will be installed within the ENVNAME directory structure, more specifically inside: ${ APPROPRAITE_PATH } / ${ ENVNAME } /lib/python2.7/site-packages Now let's try to activate the virtual env: $ source APPROPRIATE_PATH/ENVNAME/bin/activate Then you terminal prompt will change to (ENVNAME)$ check if the pip inside this environment is a local version: (ENVNAME)$ which pip should show (ENVNAME)$ ENVNAME/bin/pip Now you can feel free to use this local version of pip to install whatever packages you want. Saving package version and reinstalling packages Let's say you have installed a bunch of packages for a specific coding project and you want to save the package used and the respective version. We can have the pip list all the python packages you have installed and save it somewhere ( ENVNAME ) $ pip freeze > packages.txt Let's say you want to set up the same virtual environment on another machine. Now you can copy over \"packages.txt\", set up the machine using the steps above, then reinstall your packages within this local environment: ( ENVNAME ) $ pip install -r packages.txt which may or may not work well depending on if you have external source of packages. Let's say you are like me and have installed some python packages from non-conventional sources, one hackish way to make sure those will be installed properly in the virtualenv is to just copy the directory containing installed python packages to the local environment: ( ENVNAME ) $ sudo cp -r PATH1 APPROPRAITE_PATH/ENVNAME/python2.7/site-packages where PATH1 is $ echo PATH1 \"/TIME_MACHINE_BACKUP/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python/2.7/site-packages/\" Now you can check if the library versions are alright: ( ENVNAME ) $ ipython In [ 1 ]: import astropy In [ 2 ]: print astropy . __version__ Out [ 2 ]: '0.4' while you can check that the system wide version of the library has a different version after deactivating the virtual env $ APPROPRAITE_PATH / $ { ENVNAME } / bin / deactivate $ ipython In [ 1 ]: import astropy In [ 2 ]: print astropy . __version__ Out [ 2 ]: '0.4.1' Automating virtualenv activation Last bit of this post is about being lazy and having the virtualenv automatically activate itself when you switch to the appropriate directory using command line. You will need the python package autoenv which can be installed by: $ pip install autoenv $ echo \"source $PATH_TO_AUTOENV /activate.sh\" >> ~/.bash_profile $ cd $APPROPRIATE_PATH / # this is the parent directory where ENVNAME lives $ echo \"source ABSOLUTE_PATH/ENVNAME/bin/activate\" > .env now whenever you switch to the appropriate directory, the virtual env will be activated for you. Troubleshooting If you see any weird behavior for importing modules in the virtualenv or when using pytest, make sure that you are using versions of python packages installed from within the virtualenv and not the system version. creating virtualenv for a different python version let's say that I use python 2.7 for most of my projects and would like to play with python 3.4 i can first install python version 3.4 using macport by: $ sudo port install python34 py34-pip then the only difference when creating the virtualenv is to specify using python 3.4 $ virtualenv -p $( which python3.4 ) ENVNAME --distribute Other virtualenv tutorials There is a wonderful tutorial from astropy.","tags":"Python","loc":"https://karenyyng.github.io/posts/using_virtualenv.html","title":"Using Virtualenv for safeguarding research project dependencies"},{"text":"One thing that I greatly admire my advisor, Prof. D. Wittman, is that he explains things very clearly. Now as a science student I think this is a very important ability that I still need to improve on. By setting up this blog I hope to force myself to explain and share with others the mini-projects / ideas that I am working / have worked on.","tags":"Personal goals","loc":"https://karenyyng.github.io/posts/first_post.html","title":"New year resolution + 1st post"},{"text":"Using PyTables to store / organize data is just awesome! Reading this lowers my blood pressure level. http://pytables.github.io/usersguide/tutorials.html#dealing-with-nested-structures-in-tables","tags":"Uncategorized","loc":"https://karenyyng.github.io/posts/pytables.html","title":"PyTables!"},{"text":"It has been a wonderful week in Seattle. The organizers have really done a great job in trying to get the participants together to work on stuff. And special thanks to Phil who's done a marvelous job to talk to the participants and reboot the atmosphere when people are stuck. The progress of my little project can be viewed at: https://hackpad.com/Representing-overdensities-in-astro-data-clustering-KDE-rz6RcKo666V And there will be reports of attempts of more smaller \"hacks\" coming.","tags":"Uncategorized","loc":"https://karenyyng.github.io/posts/astrodata-hack-week-conclusion.html","title":"AstroData Hack Week conclusion"},{"text":"Fernando Perez is giving the lecture about ipython notebook this morning. These are some notes of the interesting ideas: import all the dependencies when you try to export some cell content into a function you can use %celltofunc but it is still work in progress jinja template files for conversion to ipynb ipython nbconvert --to python --template=simplypython.tpl convert ipynb as documentation in sphinx Jake Vanderplas took over just now to teach efficient programming in numpy (!!!) he talked about how to use the reduce ufunc ... that I almost never use but is useful to know Back to Fernando about cool coding + collaboration stuff: https://gitter.im/ - for any public git repos you can a start a conversation with any git users seaborn is a nice high-level library for plotting data sets, even if they are in pandas data frame format! virtual machines with packages shipped with an ipynb that documents the data analysis of a paper, using starclusters ...... Jupyter - people are trying to make ipython notebook more inclusive since it can communicate not only with python but with Julia and R! http://jupyter.org/ Some statistics from astrophysics perspective Dan Foreman-Mackey gave a presentation on the application of Gaussian processes of detecting exolents or what's known as Kriging in geophysics. And he recommended some online books at http://www.gaussianprocess.org/gpml/","tags":"hackweek","loc":"https://karenyyng.github.io/posts/astrodata-hack-week-day-1.html","title":"AstroData Hack Week - day 1"},{"text":"I just gave a public talk on El Gordo, the galaxy cluster that I've been working on. This is almost exactly one year after I gave a more technical talk on the same cluster in my PhD candidacy exam. And the slides are at ...... http://slideslive.com/38891780/el-gordo","tags":"Uncategorized","loc":"https://karenyyng.github.io/posts/sfaa-talk-on-el-gordo.html","title":"SFAA talk on El Gordo"},{"text":"This is my new favorite book about Bayesian stat that I learned about from the Penn. State Astrostat summer school. Note that it's written by an astronomer and provides Mathematica code (can't get more astro/ physics than that!!! i think only physicists / astrophysicists have such a love for Mathematica which is very expensive in my opinion ... ) Today I have been reading about sample comparison: Given two samples, determine if they come from the same distribution. For such a problem, frequentists assume a null hypothesis that the samples are different then do some tests to reject the null hypothesis. Bayesians evaluate the probability of different cases when.... means being the same, variances being the same means being the same, variances being different means being different, variances being the same mean being different and variances being different For details, read Appendix C of the book. Is this what I want for answering the scientific question that I have in mind? I am not sure. I have to think harder / talk to some Bayesians.","tags":"Uncategorized","loc":"https://karenyyng.github.io/posts/bayesian-logical-data-analysis-in-the-physical-sciences.html","title":"Bayesian Logical Data Analysis in the Physical Sciences"},{"text":"Fig Credit: (one of my favorite comics - XKCD) Today i'm reading the intro part of the book that talks about different types of telescopes. I might have read it before but i guess i should be familiar with them. When I talk with my fellow astronomer friends, I want to know which telescope they are referring to ... where it is, what it is for etc. Today I only aim to read about radio telescopes because they came first in the book. Radio wavelengths are some of the few wavelengths that we can observe on earth from lambda ~ 5 mm to ~30 m. Important features that people try to detect in the radio wavelengths include the 21-cm emission line which is a sign of neutral hydrogen 1.2 mm feature for high redshift galaxy ... maybe they r looking at X-band dropouts? (where X is some photometry bands) Important telescopes include: VLA in New Mexico Arecibo telescope in Puerto Rico which is the largest single dish radio non-steerable telescope James Clerk Maxwell telescope on Mauna Kea VLBI, very large baseline inferometry (VLBI) which consists of multiple telescopes on several continents so they can have different baselines And i know there should a few more radio arrays coming online all around the world doing exciting early universe cosmology (e.g. for the epoch of reionization). They are going to have massive data sets according to people I met at Penn. State. Hopefully they will also figure out what to do with the massive data and get good science out of them :)","tags":"Uncategorized","loc":"https://karenyyng.github.io/posts/extragalactic-astronomy-and-cosmology-2.html","title":"Extragalactic astronomy and cosmology 2"},{"text":"I have always liked Feynman as a physicist - I felt like his lectures might be too good not to be read during grad school. Today I am reading the E&M part - since I am a E&M TA this quarter. I am starting from the very first chapter where he describes the basics - you can usually tell if the book 's good if the author can go through the basic math in a crystal clear way. I cannot believe how he verbalized Gauss' law and the concept of and divergence It is just mind blowing - every undergrad should have learned it this way - i.e. to understand what's going on before looking at the math symbols! He put: (Electric) flux = (average normal component ) * (surface area) (Magnetic) circulation = (avg. tangential component) * (distance of the loop) so that Gauss' law can simply be verbalized as flux of E field through any closed surface = enclosed charge / epsilon_naught Those are just some really very simple and easy to understand words that hides all the complicated math from first time learners. Sometimes I really hope that I have the clarity to promote physics concepts like these to my students. As a TA and a physics student myself, I experience all too often that once a student starts fearing a subject, i.e. once s/he sees the complicated math, s/he stops thinking about what's going on and just tries to pull whatever seems relevant for solving a problem. They would not make the equations / math / the physical model adapt to the problem but just substitute whatever seems convenient. Setting up the problem correctly allows you to get through 90% of the problem, the rest is usually just algebra / computation. Students often do not see that..... Anyway, it's good that the Feynman lectures are at an undergrad level so it's still quite light of a reading. Hopefully I will get through all three volumns then start reading some graduate level books such as Landau and Liftshitz.","tags":"Uncategorized","loc":"https://karenyyng.github.io/posts/feynman-lecture-on-physics.html","title":"Feynman lecture on Physics"},{"text":"To best honest I never read the book very carefully when it was used as the text for my 2nd year cosmology data analysis class (sorry Chris). It is actually quite a fun read since it started by describing the important stat. developments throughout history in the first chapter and then goes through the basics. There isn't a lot of math so it's easy to skim through. The best point is that the examples are very relevant and point out common pitfalls. For example in astronomy a lot of stuff is in power law form, turns out a power-law distribution does not fit the formal definition of a p.d.f., (i.e. p.d.f. has to be > 0, and integrates to 1) and that the binning of data that follows power law distribution has to be taken with care etc. I would give it bonus points for including a lot of Bayesian stuff and giving good comparisons between the Frequentists and Bayesian views. :D","tags":"Uncategorized","loc":"https://karenyyng.github.io/posts/practical-statistics-for-astronomer.html","title":"Practical Statistics for Astronomer"},{"text":"Definitely one of my favorite text on extragalactic astronomy, so fond of it that I actually owned a physical copy of the text. Today I am reading a subsection on stuff that I work on - X-ray radiation from clusters of galaxies :D","tags":"Uncategorized","loc":"https://karenyyng.github.io/posts/extragalactic-astronomy-and-cosmology.html","title":"Extragalactic Astronomy and Cosmology"},{"text":"It is a saturday so I get to choose what type of books that I read at my \"leisure time\". As a continuation of my quest for trying to use python (well) for my research I have been investigating the possibility of using HDF5 to store some simulation results ... HDF5 is a flexible and fast file format that the theoretical astrophysicists prefer nowadays (also have to mention that the file format was first designed by my beloved college UIUC!) I like it especially because of : fast read time - library written in C, with nice Python wrapper library h5py flexible - you can choose a chunk of the data to read, not the entire dataset allows metadata to be stored - this is important coz I don't want to have to remember what exact parameters I used for creating a dataset 3 months after creating them, I want the data to be self-documented data structure organized like directory - beyond the scope of discussion here since I haven't got to that part of the book yet All in all the book was alright but they should really include a list of reference of the most commonly used commands. Here they go (for my later reference): >>> import h5py >>> import numpy as np >>> f = h5py.File(\"filepath\", \"w\") # let's just open a file to write # create two datasets with names \"array1\" and \"array2\" >>> f[\"array1\"] = np.ones((100, 1000)) # initialize a big 2D array >>> f[\"array2\"] = np.zeros((int(1e5), int(1e6))) # easy writing of metadata as attributes >>> f[\"array1\"].attrs[\"info\"] = \"big array\" >>> f[\"array2\"].attrs[\"info\"] = \"bigger array\" >>> f.close() Now this part about reading and examining the data is a bit lacking from the book, just tell me that ONE command that I need!: >>> f = h5py.File(\"filepath\", \"r\") # read only # this should give you the keys e.g. [\"array1\", \"array2\"] # so you don't have to remember what \"datasets\" are actually in the file # this is the one command to rule them all >>> f.keys() >>> f[\"array1\"].attrs.keys() # gives you the keys to call the attributes HDF5 has a weird official syntax for reading data back into python: >>> arr = f[\"array1\"][...] However, the following alternative syntax works for reading from both HDF5 or Numpy array from a dictionary, so I will stick to this alternative syntax instead: >>> arr = f[\"array1\"][:] which extends naturally to slicing an array: >>> arr = f[\"array1\"][:10] # read first 10 entries Moving on to next topic: If you have a linux / mac, you just want to check the file structure quickly, at the command line you can do: $ h5ls -vlr file.h5 It would spit out the descriptions of the dataset and the keys for calling the dataset. Currently not trying to do anything fancy with the hdf5 files yet but I think it 'd be good to use hdf5 in the long run.","tags":"Uncategorized","loc":"https://karenyyng.github.io/posts/python-and-hdf5.html","title":"Python and HDF5"},{"text":"Was reviewing some basics in astronomy. Stuff such as the color-magnitude diagram for showing the relationship between the color and the brightness of stars (weird that astronomers call it color-magnitude coz it is really magnitude vs color diagram). Was also glad to see discussions of collisionless encounters (e.g. stuff that i work on) Great book for reviewing astrophysics overall, very clear and in depth. Will aim to read the potential theory part that has more to do with the dynamical DM simulations that I do.","tags":"Uncategorized","loc":"https://karenyyng.github.io/posts/galactic-dynamics.html","title":"Galactic dynamics"},{"text":"Tried a couple of ways to debug python code, including using the %pdb magic within IPython or use %debug in the postmortem mode. Still couldn't figure out how to restart the code within the pdb invoked by the pdb magic without pdb raising a \"restart\" error. So I think maybe the best way is to stick with the ipython debugger, ipdb which has autocomplete and syntax highlighting: $ python -m ipdb myscript.py and insert breakpoints with: $ b mymodule.py:lineNum and debug until the next break point or restart by typing the shortcut for continue: $ c which behaves more similarly to gdb that I 'm more used to. if you want to debug a certain function from another module, set a break point to step inside that function by: $ from module.py import function $ b function and list all the arguments for this particular function: $ args We can also execute python code by: $ !my_python_code_or_command Stepping through the code is also the same as gdb $ step # or s One of my favorite commands is actually $ up which moves the debugger state to one state up the stack trace. i.e. previous line executed These are all the functionalities of pdb that I make use of so far. Other useful tips can be found at http://docs.python.org/2/library/pdb.html Trying to say goodbye to debugging with print statements.","tags":"Uncategorized","loc":"https://karenyyng.github.io/posts/debugging-python-code-with-ipdb.html","title":"Debugging python code with ipdb"},{"text":"If you know me, then you would know that I am not a big fan of using the mouse. Back in the old days when I used linux, I could customize most of the keyboard shortcuts so it was easy to stay with the keyboard without using the mouse. Lol even though I did set up expośe with my linux box, it was just for the fun, not out of necessity. After switching to mac, I got distracted and never actually learned how to navigate with the keyboard only. It was also one of the grudges that I first had against using a Mac. But today, I have finally found what I want! Cmd Tab - helps you switch between applications Cmd ` (this is the grave accent sign, not tilde) helps you switch between different windows of the same application","tags":"Uncategorized","loc":"https://karenyyng.github.io/posts/how-to-avoid-using-the-mouse-trackpad.html","title":"How to switch apps/windows w/o using the mouse on a Mac"},{"text":"PyData 2013 It was a really rewarding experience going to PyData and I am pleasantly surprised that the organizers have uploaded the talks already : https://vimeo.com/pydata/videos/page:1/sort:date Particularly good talks include: https://vimeo.com/63256380 - given by the CEO of Continuum Analytics talking about the next generation Python scientific computing package called Blaze which is optimized for big data analysis. https://vimeo.com/63250251 - keynote talk by Fernando Perez on ipython. more on that later.","tags":"Uncategorized","loc":"https://karenyyng.github.io/posts/pydata-2013.html","title":"PyData 2013"}]}