Here I introduce how to handle observational data in python.
%config InlineBackend.figure_format='retina'
from spacepy import pycdf
import math as mt
import matplotlib.pyplot as plt
import numpy as np
import scipy.fft as sf
import datetime
import bisect
import matplotlib.patches as patches
import matplotlib.dates as mdates
from IPython.display import display,Math
import matplotlib as mpl
import pandas as pd
plt.rcParams['font.size'] = 14
plt.rcParams['axes.linewidth'] = 1.5
You need to have the CDF C library and spacepy installed in your environment.
These websites are helpful to install.
https://pythonhosted.org/SpacePy/pycdf.html
https://stackoverflow.com/questions/37232008/how-read-common-data-formatcdf-in-python
http://shinkon-kyoto-ahaha.hatenablog.com/entry/2017/05/25/112338 #Japanse
acmfi=pycdf.CDF('ac_h0_mfi_20050121_v05.cdf')
acswe=pycdf.CDF('ac_h0_swe_20050121_v07.cdf')
Information about CDF is here, https://cdf.gsfc.nasa.gov.
You can obtain data from https://cdaweb.sci.gsfc.nasa.gov/index.html/.
An example data here is the data magnetic fields and solarwind parameters from ACE between 21-01-2005 and 22-01-2005.
print(acmfi)
BGSEc: CDF_REAL4 [5400, 3] BGSM: CDF_REAL4 [5400, 3] Epoch: CDF_EPOCH [5400] Magnitude: CDF_REAL4 [5400] Q_FLAG: CDF_INT4 [5400] SC_pos_GSE: CDF_REAL4 [5400, 3] SC_pos_GSM: CDF_REAL4 [5400, 3] Time_PB5: CDF_INT4 [0, 3] cartesian: CDF_CHAR*11 [3] NRV dBrms: CDF_REAL4 [5400] format_time: CDF_CHAR*2 [3] NRV label_BGSE: CDF_CHAR*6 [3] NRV label_bgsm: CDF_CHAR*8 [3] NRV label_pos_GSE: CDF_CHAR*10 [3] NRV label_pos_GSM: CDF_CHAR*10 [3] NRV label_time: CDF_CHAR*27 [3] NRV unit_time: CDF_CHAR*4 [3] NRV
print(acswe)
Epoch: CDF_EPOCH [1350] Np: CDF_REAL4 [1350] SC_pos_GSE: CDF_REAL4 [1350, 3] SC_pos_GSM: CDF_REAL4 [1350, 3] Time_PB5: CDF_INT4 [0, 3] Tpr: CDF_REAL4 [1350] V_GSE: CDF_REAL4 [1350, 3] V_GSM: CDF_REAL4 [1350, 3] V_RTN: CDF_REAL4 [1350, 3] Vp: CDF_REAL4 [1350] alpha_ratio: CDF_REAL4 [1350] format_time: CDF_CHAR*2 [3] NRV label_V_GSE: CDF_CHAR*8 [3] NRV label_V_GSM: CDF_CHAR*8 [3] NRV label_V_RTN: CDF_CHAR*8 [3] NRV label_pos_GSE: CDF_CHAR*10 [3] NRV label_pos_GSM: CDF_CHAR*10 [3] NRV label_time: CDF_CHAR*27 [3] NRV unit_time: CDF_CHAR*4 [3] NRV
print(acmfi['BGSEc'].meta)
AVG_TYPE: [CDF_CHAR] CATDESC: Magnetic Field Vector in GSE Cartesian coordinates (16 sec) [CDF_CHAR] DEPEND_0: Epoch [CDF_CHAR] DEPEND_1: cartesian [CDF_CHAR] DICT_KEY: magnetic_field [CDF_CHAR] DISPLAY_TYPE: time_series [CDF_CHAR] FIELDNAM: Mag Field vector, GSE coord [CDF_CHAR] FILLVAL: -1e+31 [CDF_REAL4] FORMAT: F9.3 [CDF_CHAR] LABL_PTR_1: label_BGSE [CDF_CHAR] SCALEMAX: [25. 25. 25.] [CDF_REAL4] SCALEMIN: [-25. -25. -25.] [CDF_REAL4] UNITS: nT [CDF_CHAR] VALIDMAX: [65534. 65534. 65534.] [CDF_REAL4] VALIDMIN: [-65534. -65534. -65534.] [CDF_REAL4] VAR_NOTES: [CDF_CHAR] VAR_TYPE: data [CDF_CHAR]
#indx=['day','hour','min','sec','np','tp','v','vx','vy','vz','bx','by','bz','b']
#df = pd.read_csv('wind_plasma_magnetic_field_magnetic_cloud_19980304.txt', header=None, delimiter=r"\s+", names=indx)
#date1=pd.DataFrame({'year': 1998, 'month': 3, 'day':df['day'], 'hour':df['hour'], 'minute':df['min'], 'second':df['sec']})
#date2=pd.to_datetime(date1)
#
#d = {'np':df['np'], 'tp':df['tp'][:], 'vx':df['vx'][:], 'vy':df['vy'][:], 'vz':df['vz'][:], 'v':df['v'][:], 'bx':df['bx'][:], 'by':df['by'][:], 'bz':df['bz'][:], 'b':df['b'][:]}
#df = pd.DataFrame(data=d)
#df.index=date2
#df.index.name='date'
#
#df.head()
### convert to the dataframe of pandas ###
d = {'bx': acmfi['BGSEc'][:,0], 'by': acmfi['BGSEc'][:,1], 'bz':acmfi['BGSEc'][:,2]}
df0 = pd.DataFrame(data=d,index=acmfi['Epoch'][:])
df0.index.name='time'
##########################################
print(df0.head())
### convert to the dataframe of pandas ###
d = {'vx': acswe['V_GSE'][:,0], 'vy': acswe['V_GSE'][:,1], 'vz':acswe['V_GSE'][:,2], 'np':acswe['Np'][:]}
df1 = pd.DataFrame(data=d,index=acswe['Epoch'][:])
df1.index.name='time'
##########################################
print(df1.head())
bx by bz time 2005-01-21 00:00:06 0.662 -1.584 3.374 2005-01-21 00:00:22 0.805 -1.896 3.267 2005-01-21 00:00:38 0.785 -1.734 3.308 2005-01-21 00:00:54 0.829 -1.761 3.329 2005-01-21 00:01:10 1.065 -1.898 3.193 vx vy vz np time 2005-01-21 00:00:22 -1.000000e+31 -1.000000e+31 -1.000000e+31 1.000000e+01 2005-01-21 00:01:26 -1.000000e+31 -1.000000e+31 -1.000000e+31 0.000000e+00 2005-01-21 00:02:30 -1.000000e+31 -1.000000e+31 -1.000000e+31 1.000000e+01 2005-01-21 00:03:34 -1.000000e+31 -1.000000e+31 -1.000000e+31 1.000000e+01 2005-01-21 00:04:38 -1.000000e+31 -1.000000e+31 -1.000000e+31 -1.000000e+31
fig,ax=plt.subplots(figsize=(16,4))
ax.plot(df0.index[:], df0['bx'][:],'-',lw=2)
ax.plot(df0.index[:], df0['by'][:],'-',lw=2)
ax.plot(df0.index[:], df0['bz'][:],'-',lw=2)
ax.set_xlim('2005-01-21 15:00:00', '2005-01-21 18:00:00')
ax.set_ylim(-40,40)
myfmt=mdates.DateFormatter('%H:%M:%S\n%d %b %Y')
ax.xaxis.set_major_formatter(myfmt)
plt.show()
/Users/mnakanot/anaconda3/lib/python3.7/site-packages/pandas/plotting/_matplotlib/converter.py:102: FutureWarning: Using an implicitly registered datetime converter for a matplotlib plotting method. The converter was registered by pandas on import. Future versions of pandas will require you to explicitly register matplotlib converters. To register the converters: >>> from pandas.plotting import register_matplotlib_converters >>> register_matplotlib_converters() warnings.warn(msg, FutureWarning)
If you find missing (or unphysical such like '1e30') data values as follows,
fig,ax=plt.subplots(figsize=(16,4))
ax.plot(df1.index[:], df1['np'][:],'-',lw=2)
ax.set_xlim('2005-01-21 0:00:00', '2005-01-22 0:00:00')
ax.set_ylim(0,70)
myfmt=mdates.DateFormatter('%H:%M:%S\n%d %b %Y')
ax.xaxis.set_major_formatter(myfmt)
plt.show()
Easy way is to replace missing data into "nan" using np.where and np.nan
#extract the data to modify
#data=np.where(np.abs(acswe['Np'][:])>100, np.nan, acswe['Np'][:]) #Here, 100 is an arbitrary value.
df1=df1.mask(abs(df1)>1e30)
### fill missing datetime ###
#df=df.resample('92ms').first().interpolate('linear')
#df
######################################
fig,ax=plt.subplots(figsize=(16,4))
ax.plot(df1.index[:], df1['np'][:],'-',lw=2)
ax.set_xlim('2005-01-21 0:00:00', '2005-01-22 0:00:00')
ax.set_ylim(0,70)
myfmt=mdates.DateFormatter('%H:%M:%S\n%d %b %Y')
ax.xaxis.set_major_formatter(myfmt)
plt.show()
note: spacepy does not support numpy'fancy indexing'.