Business, Economics, Accounting and Finance
Data Science Workspace
Describes how I usually set up my conda environment for data analytics.
I used to conduct data analysis, a lot. Nowadays? Not so much. Before I completely stop doing data-related work, it's probably a good idea to consolidate how my usual environment is set up.
You might be also interested in my infrastructure for document-parsing projects. Yes, I do have a Medium blog.

Conda Environment

I use conda to manage my packages, mostly because it's language-agnostic and popular among data scientists. In terms of variants, I prefer miniconda3 over Anaconda. This is because I would be creating virtual environments from scratch for each project anyways, and having a default environment with gigabytes of unused packages sounds like a waste.
Here's some packages I generally install to every environment:
1
conda install jupyterlab # Provides main Web UI.
2
conda install pandas seaborn tqdm # The data science basics.
3
conda install requests # For retrieving data from Internet in Python, e.g. occasional web scraping.
4
conda install mongodb pymongo # When iteratively performing actions on multiple objects, I store temp variables in MongoDB, so as to avoid filling up memory.
Copied!
Jupyterlab plugins to install:

MongoDB

When iteratively performing actions on multiple objects, I store temporary variables in MongoDB, so as to avoid filling up memory.
Organization. I maintain one instance of MongoDB server for each entity I work for (e.g. WRDS, WWBP, and personal). Within each instance, I use one database for each project and one collection for each type of temporary variable.
Usage. I usually organize code into cells that contain codes in the following form:
1
collection = db['item']
2
criteria = {}
3
N = collection.count_documents(criteria)
4
cursor = collection.find(criteria)
5
def work(doc):
6
# ...
7
collection = db['itemCleaned190518']
8
with Pool(processes=40) as p:
9
with tqdm(total=N, desc='Clean items') as pbar:
10
for doc in p.imap_unordered(work, cursor):
11
pbar.update()
12
collection.insert_one(doc)
Copied!
​
Last modified 2yr ago
Copy link